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Abstract 


This  thesis  addresses  two  cjnestions  related  to  language.  First,  how  do  children  learn  the  language- 
specific  components  of  their  native  language?  Second,  how  is  language  grounded  in  perception?  These 
two  questions  are  intimately  related.  One  piece  of  language-specific  information  which  children  must 
learn  is  word  meanings.  Knowledge  of  the  meanings  of  utterances  containing  unknown  words  presumably 
aids  children  in  the  process  of  determining  the  meanings  of  those  words.  A  complete  account  of  such 
a  process  must  ultimately  explain  how  children  extract  utterance  meanings  from  their  non-linguist ic 
context.  In  the  first  part  of  this  thesis  I  present  precisely  formulated  algorithms  which  attempt  to 
answer  the  first  question.  These  algorithms  utilize  a  cross-situational  learning  strategy  whereby  the 
learner  finds  a  language  model  which  is  consistent  across  several  utterances  paired  with  their  non- 
linguistic  context.  This  allows  the  learner  to  acquire  partial  knowledge  from  ambiguous  situations  and 
combine  such  partial  knowledge  across  situations  to  infer  a  unique  language  model  despite  the  ambiguity- 
in  the  individual  isolated  situations.  These  algorithms  have  been  implemented  in  a  series  of  computer 
programs  which  test  this  cross-situational  learning  strategy  on  linguistic  theories  of  successively  greater 
sophistication.  In  accord  with  current  hypotheses  about  child  language  acquisition,  these  systems  use 
only  positive  examples  to  drive  their  acquisition  of  a  language  model.  MaIMRA,  the  first  program 
described,  learns  word-to-meaning  and  word-to-category  mappings  from  a  corpus  pairing  utterances 
with  sets  of  expressions  representing  the  potential  meanings  of  those  utterances  hypothesized  by  the 
learner  from  the  non-linguistic  context.  Maimra's  syntactic  theory  is  embodied  in  a  fixed  context- 
free  grammar.  Davra,  the  second  program  described,  extends  Maimra  by  replacing  the  context-free 
grammar  with  a  parameterized  variant  of  X  theory.  Given  the  same  corpus  as  Maimra,  Davra  learns 
the  parameter  settings  for  X  theory  in  addition  to  a  lexicon  mapping  words  to  their  syntactic  category 
and  meaning.  Davra  has  been  successfully  applied,  without  change,  to  tiny  corpora  in  both  English 
and  Japanese,  learning  the  requisite  lexica  and  parameter  settings  despite  differences  in  word  order 
between  the  two  languages.  Kenunia,  the  third  program  described,  incorporates  a  more  comprehensive 
model  of  universal  grammar  supporting  movement,  adjunction,  and  empty  categ  les.  as  well  as  more 
extensive  parameterization  of  its  X  theory  component.  This  model  of  universal  grammar  is  based  on 
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recent  linguistic  tiieory  anil  incluiles  such  notions  as  the  DP  hy])othesis,  \'P-iuteinal  subjects,  ami  \'  to-1 
nioveinent.  KenL'NIA  is  able  to  learn  the  parameter  settings  of  this  model,  as  well  as  word-tiecategory 
mappings,  in  the  presence  of  movement  and  empty  categories,  rin'  algorithms  underlying  M.\1.\1R.\, 
D.a\'R.\,  and  Kenuni.v  are  presented  in  detail  along  with  annotated  examples  dejiicting  their  operation 
on  sample  learning  tasks. 

lit  the  second  part  of  this  thesis  I  present  a  novel  ajtproach  to  event  percejition.  the  processes  of  de¬ 
termining  when  events  describeil  by  simple  spatial  motion  verbs  such  throif.  pick  up.  put.  and  walk  occur 
in  visual  input.  This  approach  is  motivated  by  recent  experimental  studies  of  adult  visual  i)erce|)tion 
and  infant  knowledge  of  object  permanence.  In  formulating  this  approach  1  advance  three  claims  about 
event  perception  and  the  process  of  grounding  language  in  visual  percejrtion.  First.  1  claim  that  the  no¬ 
tions  of  support,  contact,  and  attachment  play  a  central  role  in  ilefining  the  meanings  of  simple  sjiatial 
motion  verbs  in  a  way  that  delineates  prototypical  occurrences  of  events  described  by  those  verbs  from 
non-occurrences.  Prior  approaches  to  lexical  .semantic  repre.sentation  focus.sed  primarily  on  movement 
and  lacked  the  ability  to  incorporate  these  crucial  notions  into  the  definitions  of  simple  spatial  motion 
verbs.  Seconil,  1  claim  that  support,  contact,  and  attachment  relations  between  objects  are  recovered 
from  images  by  a  process  of  counterfact ual  simulation.  For  instance,  one  object  supports  another  object 
if  the  latter  does  not  fall  when  the  short-term  future  of  the  image  is  predicted,  but  does  fall  if  the  former 
is  removed.  Such  counterfactual  simulations  are  performed  by  a  modular  imagination  rapacity.  Third. 
I  claim  that  this  imagination  capacity,  while  superficially  similar  in  intent  to  traditional  kinematic  sim¬ 
ulation,  is  actually  based  on  a  drastically  different  foundation.  This  foundation  takes  tin'  process  of 
enforcing  naive  physical  constraints  such  as  substantiality,  continuity,  and  attachment  relations  between 
objects  to  be  primary.  In  doing  so  it  sacrifices  physical  accuracy  and  coverage.  This  is  in  contrast  to  the 
traditional  approach  which  achieves  physical  accuracy  and  coverage  by  numerical  integration,  relegating 
the  maintenance  of  constraints  to  a  process  of  secondary  importance  built  around  the  numerical  inte¬ 
gration  core.  A  simplified  version  of  this  theory  of  event  perception  has  l>een  implemented  in  a  program 
called  Abigail  which  watches  a  computer-generated  animated  movie  and  i>roduces  a  description  of  the 
objects  and  events  which  occur  in  that  movie.  Abigail's  event  perception  processes  rely  on  counter- 
factual  simulation  to  recover  changing  support,  contact,  and  attachment  relations  between  objects  in 
the  movie.  Prior  approaches  to  this  task  were  based  solely  on  determining  the  spatial  relations  between 
objects  in  the  image  sequence,  grounding  verb  meanings  in  static  geometric  predicates  used  to  compute 
those  spatial  relations  without  counterfactual  analysis.  The  detailed  algorithms  underlying  the  novel 
implementation  are  presented  along  with  annotated  examples  depicting  its  analysis  of  sam|de  movies. 


Thesis  Supervisor:  Robert  C.  Berwick 

Title:  Associate  Professor  of  Computer  Science  and  Engineering 


TTllD  -mpra 

Yt  in^n  na 

.n.a.3£j.n 


Dedicated  to  the  memorif  of  my  mothe 

Edna  Roskin  Siskind 


4 


Ackiiowlvdguieiits 

Last  week  .sometime.  1  met  a  young  girl  wliiie  eating  at  I  lie  KK.  lo  sjiark  conversat  ioii.  1  a.sked  her  what 
grade  she  was  in.  to  which  she  (iromptly  replied  'first,  and  yon?'.  Feeling  obliged  to  respond.  1  ihonghi 
for  a  moment  and  answered  I wenty-fifth'.  I  am  in  twenty-fifth  grade,  give  or  take  a  few  dejiending 
on  how  you  count.  Family  memhers  and  friends  often  |)ro<l  me  asking  when  I  am  going  to  finish  my 
thesis,  get  out  of  school,  and  get  a  real  job.  'I'hat  just  misses  the  point:  there  is  little  more  to  life  than 
learning — and  children.  Perhaps  that  is  why  1  clioose  child  language  accpiisition  as  my  research  topic. 

The  acknowledgment  page  for  my  masters  thesis  ended  with  the  following  statement. 

I  am  far  too  ashamed  to  acknowledge  my  family,  frieiuls.  and  teachers  on  such  a  meager 
and  paltry  document.  That  will  have  to  wait  for  my  Ph.D.  Perhajis  then  I  can  present  a 
document  worthy  of  acknowledging  them. 

Getting  my  masters  degree  was  a  long  and  arduous  ia.sk.  Both  the  research  and  the  writing  were 
unfulfilling.  Getting  my  Ph.D.  was  different.  The  research  was  pleasurable-  writing  the  document  was 
agonizing.  One  thing  1  can  say.  however,  is  that  1  am  much  more  content  with  the  results.  So  I  feel 
comfortable  delivering  on  my  past  promise. 

Before  iloing  that,  however.  1  must  first  gratefully  acknowledge  the  generous  support  1  have  received 
for  my  research  from  a  tnimber  of  sources  during  my  graduate  career.  Those  sources  which  contributed 
to  the  work  described  in  this  document  include  AT.k'T.  for  a  four  year  I’h.D.  fellowship.  Xerox  Corpora¬ 
tion.  for  affording  me  the  opportunity  to  work  on  this  project  during  my  more  recent  visits  to  PARC?  a 
Presidential  Young  Investigator  Award  to  Professor  Robert  C.  Berwick  under  National  Science  Founda¬ 
tion  Grant  DC'R-855r)2543,  a  grant  from  the  Siemens  (Corporation,  and  the  Kapor  Family  Foundation. 
The  fact  that  this  generous  funding  has  come  with  no  strings  attached  has  has  allowed  me  to  approach 
this  research  with  child-like  playfulness  and  to  continue  schooling  through  the  twenty-fifth  grade.  I  am 
indebted  to  Patrick  Winston,  the  principal  of  the  MIT  AI  Lab,  who  along  with  Peter  Szolovits  and 
Victor  Zue,  encouraged  me  to  pursue  this  research. 

While  writing  my  masters  thesis,  I  wasted  several  hours  of  Xerox's  money  recalling  the  names  of 
my  grade  school  teachers  and  preparing  the  DTgX  that  prints  the  next  page.  That  list  omits  many 
other  teachers,  as  well  as  family  and  friends,  whose  names  are  far  too  numerous  to  list.  Their  names  are 
included  herein  by  reference  (or  however  you  say  that  in  legalese).  Several  peo|>le  however.  deservt>  special 
mention.  My  parents,  for  long  ago  giving  up  hope  that  I  would  ever  finish:  Naomi,  ^'aneer,  Shlomiya. 
Yavni.  and  Maayan  Bar- Yam,  Tova,  Steve,  Asher,  and  Arie  Greenberg,  and  Avi,  Audi,  Zecharia.  and  'S'ael 
Klausner,  for  being  there  to  comfort  me  when  I  thought  1  would  never  finish:  and  Jeremy  Wertheimer, 
Jonathan  Amsterdam,  and  Carl  de  Marcken  for  proofreading  this  document  as  1  was  finishing.  Jeremy 
and  C’arl  helped  unwedge  me  at  key  times  during  the  research  and  writing  that  went  into  this  thesis. 

Lui  Collins  once  commented  that  she  was  fortunate  to  have  never  been  so  unwise  as  to  incorjiorate 
the  name  of  a  lover  in  a  song  she  had  written,  thus  avoiding  the  pain  and  embarrassment  of  needing  to 
perform  that  song  after  the  relationship  was  over.  At  the  risk  of  being  foolish,  there  is  one  additional 
friend  whom  I  would  like  to  thank  explicitly.  Beth  Kozinn  has  been  a  source  of  support  throughout 
much  of  this  thesis  project.  She  was  one  of  the  few  people  who  understood  and  appreciated  the  passion 
1  have  for  my  work  and  was  even  fascinated  by  the  idea  that  1  spent  my  days — and  nights  -trying  to 
get  a  computer  to  understand  cartoons  so  simple  that  any  two  year  old  would  find  them  boring.  "S  et 
she  herself  always  wanted  to  know  what  John  and  Mary  were  doing  in  frame  750.  Beth  taught  me  a 
lot  about  contact.  This  helped  me  get  through  the  anguish  of  applying  for — and  not  being  offered — an 
academic  position.  I  wish  I  could  teach  her  something  about  attachment. 

In  summary  I  attribute  this  thesis  to  all  of  my  teachers,  from  Ms.  Solomon,  my  nursery  school 
teacher,  through  David  McAllester  and  Bob  Berwick,  my  twenty-fifth  grade  teachers,  to  Lila  (tleitman. 
who  will  be  my  twenty-sixth  grade  teacher.  Someone  once  said  that  the  only  reason  people  have  kids 


is  so  that  tliey  have  an  excuse  to  break  out  of  the  pretense  of  being  an  adult  and  to  act  like  a  child 
in  their  presence.  Perha|)s  the  only  reason  |)eople  go  to  school  is  for  the  privilegt'  and  obligation  for 
acknowledging  and  honoring  one's  teachers.  1  hope  that  childhood  and  schwling  never  ((  a.-^e  so  that  1 
will  forever  retain  that  privilege  and  obligation. 


To: 

Ms.  Solomon,  iny  nursery  school  teacher 
Mrs.  Miller,  my  kindergarten  teacher 
Mrs.  Savrin,  my  first  grade  teacher 
Miss  Kallman.  my  second  grade  teacher 
Mrs.  Theoilor.  my  third  grade  teacher 
Mrs.  Keogh,  my  teacher  for  EAR  I  and  H 

my  English  teachers: 

Mrs.  Goldberg,  for  seventh  grade 
Mr.  Bershaw.  for  eighth  grade 
Mr.  Iglio.  for  ninth  grade 
Mrs.  Johnson,  for  tenth  grade 
Mr.  Kraus,  for  eleventh  grade 
Mr.  Taussig,  for  twelfth  grade 
Mr.  Guy.  for  twelfth  grade 

my  Spanish  teachers: 

Miss  di  Prima.  for  seventh  grade 
Miss  Murphy,  for  eighth  grade 
Mrs.  Gomez,  for  tenth  grade 

my  Social  Studies  teachers: 

Mrs.  Caldera,  for  seventh  grade 
Mr.  Markheld.  for  eighth  grade 
Mr.  Lerman.  for  ninth  grade 
Mrs.  Cohen,  for  tenth  grade 
Dr.  Kelly,  for  eleventh  grade 


my  Music  teachers: 

Mr.  Sej>e. 

Mr.  de  Silva. 

Mr.  Carubia. 

Mr.  Katz. 

my  .Science  teachers: 

Mrs.  Baron,  for  seventh  grade 
Mr.  Sponenberg.  for  Biology 
Mr.  Palazzo,  for  Physics 

my  Mathematics  teachers: 

Mr.  Gould,  for  seventh  grade 
Mr.  Day.  for  Algebra 
Mr.  Fitzgerald,  for  Cleometry 
Mr.  Okun.  for  Trigonometry 
Mr.  Metviner.  for  eleventh  gradi' 
Afr.  Gorman,  for  Calculus 


lid  with  particular  respect  and  fondness  to: 

Mr.  Silver,  my  eighth  grade  Science  teacher,  from  whom  I  learned 
conviction,  cynicism,  precision,  and  the  love  of  the  pursuit 
of  knowledge. 


and  to: 


Mr.  Gerardi.  my  Chemistry  teacher  and  mentor  for  teacher-student 
relationships.  May  I  be  so  fortunate  to  relate  to  my  future 
students,  and  be  their  role  model,  as  well  as  he  related  to  his. 


and  to: 


Mrs.  Jagos.  my  ninth  grade  Spanish  teacher,  for  teaching  me 
friendship,  and  how  to  be  a  ninisrh. 


Contents 


1  Overview  1 1 

1  Language  Acquisition  23 

2  Introduction  25 

2.1  The  Bootstrapping  Problem . 2(i 

2.2  Outline  .  ;{1 

3  Cros.s-Situatioual  Learning  33 

3.1  Linking  and  Fracturing .  33 

3.2  Learning  Syntactic  Categories . 37 

3.3  Learning  Syntactic  Categories  and  Word  Meanings  Togetlier .  3it 

4  Three  Implementations  47 

4.1  Maimra . 4M 

4.2  Davra .  ,j.j 

4.2.1  Alternate  Search  Strategy  for  Davra . (53 

4.3  Kenunia . Oti 

4.3.1  Overview  of  Kenunia . (>1) 

4.3.2  Linguistic  Theory  Incorporated  in  Kenunia . OS 

4.3.3  Search  Strategy .  73 

4.3.4  The  Parser  .  74 

4.3.5  Additional  Restrictions  .  77 

4.3.6  Kenunia  in  Operation .  7S 

5  Conclusion  83 

5.1  Related  Work .  83 

5.1.1  Semantic  Bootstrapping . S3 

5.1.2  Syntactic  Bootstrapping .  8() 

5.1.3  Degree  0+  Learning .  87 

5.1.4  Salveter .  88 

5.1.5  Pustejovsky .  89 

5.1.6  Rayner  et  al . 89 

5.1.7  Feldman .  90 

5.2  Discussion . 91 


f<  COSTF.STS 

II  Grounding  Language  in  Perception  95 

6  Introduction  97 

('1  TIu'  Event  F’erception  Ta.'ik . 1(10 

().2  Outline  . 101 

7  Lexical  Semantics  105 

8  Event  Perct^ption  121 

8.1  The  Ontology  of  Abigail's  Micro- World  . 122 

8.1.1  Figures  .  122 

8.1.2  Limitations  and  Simplifying  Assumptions . 121 

8.1.2  Joints . 127 

8.1.-1  Layers . 12N 

8.2  Perceptual  Processes . 120 

8.2.1  Deriving  the  Joint  and  Layer  .Models . 12(< 

8.2.2  Deriving  Support,  Contact .  and  Attachment  Kelatiotis . 110 

8.2  Experitttental  Evidetice . l.'tO 

8.4  Sumtnary . L'tO 

9  Naive  Physics  157 

9.1  Simulation  Framework . KiO 

9.2  Translation  and  Rotation  Limits  . lO') 

9.3  Complications . 180 

9.3.1  Clusters . 181 

9.3.2  Tatigential  Moventent  . 182 

9.3.3  Touchittg  Barriers  . 180 

9.3.4  Tolerance . 188 

9.4  Limitations . 188 

9.5  Experimental  Evidence . 191 

9.6  Summary . 198 

10  Conclusion  199 

10.1  Related  Work . 199 

10.1.1  Kramer . 199 

10.1.2  Funt . 201 

10.2  Discussion . 202 

A  Maimra  in  Operation  205 

B  Kenunia  in  Operation  235 

C  Abigail  in  Operation  267 


List  of  Figures 


1.1  A  generic  language  pro(•es^sing  architect  lire .  1'^ 

1.2  The  Engli.sh  cor[)ns  [iresenled  to  D.WR.a .  17 

1.3  DaVRA  s  output  Ifi 

1.4  Abigail's  movie .  20 

1.5  Imagining  frame  11 . 21 

2.1  A  generic  language  i>rocessing  arcliit-  cture . 2?' 

2.2  Cross-situational  learning  architecture .  20 

3.1  Jackendoff's  linking  rule . 35 

3.2  Analy.ses  of  five  and  six  word  utterances .  10 

3.3  Weak  cros.s-situational  learning  of  sytitactic  categories .  41 

3.1  C’onsistetit  hut  incorrect  analyses  after  strong  cross-situational  syntax . 42 

3.5  All  submeanings  of  the  meaning  expressions  in  the  sample  corpus .  43 

3.6  Word  meanings  inferred  by  weak  cross-sititational  .semantics .  44 

4.1  MaIMRa's  grammar .  4(< 

4.2  The  English  corpus  presented  to  Maimra  and  D.avra . 50 

4.3  .Maimra ’s  output .  54 

4.4  Davra's  search  strategy .  5<S 

4.5  Davra's  output  for  the  English  corpus .  72 

4.6  The  Japanese  corpus  presented  to  Davra . 63 

4.7  Davra's  output  for  the  Japanese  corpus . 64 

4.8  Kenunia's  corpus  . 60 

4.9  Kenunia'.s  parser .  75 

4.10  Kenunia's  prior  semantic  knowledge .  78 

4.11  Ke.nunia's  output . 80 

5.1  The  techni(|ue  used  by  Rayner  et  al.  (1988) .  90 

6.1  A  typical  movie  frame . 98 

6.2  .Abigail's  language  faculty  .  99 

6.3  A  movie  script . 102 

6.4  .Abigail's  movie . 103 

7.1  Different  varieties  ot  supi>ort  relationships . Ill 

7.2  Jackendoff's  linking  rule . 117 

7.3  Borchardt's  definitions . 119 


9 


10 


LIST  OF  FH  H  HFS 


8.1  riie  touch  and  overlaj)  relations . 120 

8.2  tivent  perception  architecture . l.'U) 

8.3  The  algorithm  for  updating  the  joint  model . 133 

8.4  The  first  twelve  frames  of  ABKiAIL's  movie  . 13') 

8.5  Imagining  frame  0  with  empty  joint  and  layer  models . 130 

8.6  Hypothesized  joint  model . 137 

8.7  Hypothesized  layer  model . 138 

8.8  Imagining  frame  1 1 . 141 

8.9  A  short  movie . 143 

8.10  Event  graph  for  the  short  movie . 144 

8.11  Perceptual  primitives  recovered  for  the  short  movie— 1 . 145 

8.12  Perceptual  primitives  recovered  for  the  short  movie  -  II  . 140 

8.13  Event  graph  for  Abigail's  movie . 148 

8.14  Imagining  frame  172 . lol 

8.15  Three  tables  collectively  supporting  an  object . 152 

8.16  Experiment  1  from  Freyd  et  al.  (1988) .  153 

8.17  Experiment  2  from  Freyd  et  al.  (1988) .  154 

9.1  One  step  continuous  simulation . 159 

9.2  Sliding . 162 

9.3  Falling  over . 104 

9.4  Translating  a  line  segment  /  until  its  endpoint  />(/)  touches  another  line  segment  <j  ...  .  167 

9.5  Translating  a  line  segment  /  until  its  endpoint  ;>(/)  touches  the  endpoint  p(<j)  of  another 

line  segment  g . 168 

9.6  Translating  a  circle  /  until  it  is  tangent  to  a  line  segment  g . 169 

9.7  Translating  a  line  segment  /  until  its  endpoint  p{f)  touches  a  circle  g  . 170 

9.8  Translating  a  line  segment  /  until  its  endpoint  p(f)  touches  a  circle  g  . 171 

9.9  Translating  a  circle  /  until  blocked  by  another  circle  g  when  /  and  g  are  outside  each  other  172 

9.10  Translating  a  circle  /  until  blocked  by  another  circle  g  when  /  is  inside  g  . 173 

9.11  Rotating  a  line  segment  /  until  its  endpoint  p(f)  touches  another  line  segment  g . 174 

9.12  Rotating  a  line  segment  /  until  its  endpoint  p(f)  touches  the  endpoint  p(g)  of  another 

line  segment  g . 175 

9.13  Rotating  a  circle  /  until  it  is  tangent  to  a  line  segment  g . 177 

9.14  Rotating  a  line  segment  /  until  its  endpoint  p{f)  touches  a  circle  g . 178 

9.15  Rotating  a  circle  /  until  blocked  by  another  circle  g  when  /  and  g  are  outside  each  other  179 

9.16  Rotating  a  circle  /  until  blocked  by  another  circle  g  when  /  is  inside  g . 180 

9.17  Situations  requiring  clusters . 182 

9.18  Tangential  translation . 184 

9.19  Tangential  rotation . 185 

9.20  Barriers . 187 

9.21  Coincident  line  segments . 189 

9.22  Roundoff  errors  can  cause  substantiality  violations . 190 

9.23  Imagination  limitations — 1 . 190 

9.24  Imagination  limitations — II . 191 

9.25  Rube  Goldberg  mechanism  . 192 

9.26  An  experiment  demonstrating  infant  knowledge  of  substantiality  . 194 

9.27  A  second  experiment  demonstrating  infant  knowdedge  of  substantiality . 195 

9.28  An  experiment  testing  infant  knowledge  of  gravity  . 195 

9.29  An  experiment  demonstrating  infant  knowledge  of  continuity  . 197 


Chapter  1 

Overview 


This  th  addresses  two  questions  related  to  language.  First;  How  do  chtldun  Uani  Hit  laiiguagt- 
specific  components  of  their  native  language'!'  Second:  How  is  language  grounded  in  perce ption'!'  These 
two  questions  are  intimately  related.  One  piece  of  language-.specific  information  which  children  mnsi 
learn  is  word  meanings.  Knowledge  of  the  meanings  of  utterances  containing  unknown  words  presumably 
aids  children  in  the  process  of  determining  the  meanings  of  those  words.  A  complete  account  of  such 
a  process  must  ultimately  e.\plain  how  children  extract  utterance  meanings  from  their  non-linguist ic 
context.  Thus  the  study  of  child  lattguage  acquisition  has  motivated  the  study  of  event  ]>ercei>tion  as  a 
means  of  groutiding  language  in  perception. 

The  long-term  goal  of  this  research  is  a  comprehensive  theory  of  language  acquisition  grounded  in  vi¬ 
sual  perception.  This  thesis  however,  presents  more  modest  short-term  accomplishments.  Currently,  the 
language  acquisition  and  perception  components  are  the  subjects  of  independent  investigation.  Part  1 
of  this  thesis  discusses  language  acquisition  while  part  H  discusses  event  jterception.  These  two  parts, 
however,  fit  into  a  common  language  processing  architecture,  which  this  thesis  takes  to  be  reflective  of 
the  actual  human  language  faculty.  Figure  1.1  depicts  this  architecture.  In  its  entirely,  the  architecture 
constitutes  a  relation  between  linguistic  utterances,  the  non-linguistir  observations  to  which  those  utter¬ 
ances  refer,  and  a  language  model  which  mediates  that  mapping.  The  architecture  itself  is  presumed  to 
be  innate  and  universal.  Any  language-specific  information  is  encoded  in  the  language  model.  Language 
acquisition  can  be  seen  as  the  task  of  learning  that  language  model  from  utterances  paired  with  obser¬ 
vations  derived  from  the  non-linguistic  context  of  those  utterances.  The  language  model  to  be  acquired 
is  the  one  which  successfully  maps  those  utterances  heard  by  a  child  to  the  observed  events. 

The  language  processing  architecture  divides  into  three  processing  modules  which  relate  seven  rep¬ 
resentations.  The  language  model  contains  two  parts,  a  grammar  encoding  language-specific  syntactic 
knowledge,  and  a  lexicon.  The  lexicon  in  turn  contains  two  parts,  one  mapping  words  to  their  syntactic 
categories  and  the  other  mapping  words  to  their  meanings.  A  parser  relates  utterances  to  their  s.vn- 
tactic  structure.  While  the  parser  itself  encodes  universal  syntactic  knowledge,  presumed  to  be  innate, 
the  mapping  between  utterances  and  their  syntactic  structure  is  also  governed  by  the  language-specific 
grammar  and  the  syntactic  categories  of  words.  A  linker  relates  the  meaning  of  an  entire  utterance, 
represented  as  a  semantic  structure,  to  the  meanings  of  the  words  comprising  that  utterance,  taken  from 
the  lexicon.  This  mapping  is  presumably  mediated  by  the  syntactic  structure  of  the  utterance.  Finally,  a 
perception  module  relates  semantic  structures  denoting  the  meanings  of  utterances  to  the  non-linguistic 
observations  referred  to  by  those  utterances. 

This  architecture  can  be  though  of  as  an  undirected  declarative  relation.  By  specifying  the  direction 
of  information  flow,  the  architecture  can  by  applied  to  different  tasks.  Taking  an  utterance  and  language 
model  as  input  and  producing  predicted  observations  as  output  constitutes  using  the  architecture  as 


11 


12 


CHAPTLH  1.  o\  i:m  I f.W 


utterance 


obsenration 


- X  syntactic  ^ 

^  (  iHkT  } 


syntactic 

categories 


word 

meanings 


grammar 


lexicon 


X 


X 


language 

model 


semantic 

structures 


Figure  1.1:  A  generic  language  processing  architecture.  It  contains  three  processing  nioclules;  a 
parser,  a  linker,  and  a  perceptual  component,  that  niutuallv  constrain  hve  representations:  the 
input  utterance,  the  syntax  of  that  utterance,  the  meaning  of  that  utterance,  the  visual  perception 
of  events  in  the  world,  and  a  language  model  comprising  a  grammar  and  a  lexicon.  The  lexicon 
in  turn  maps  words  to  their  syntactic  category  and  meaning.  CJiven  input  comprising  utterances 
paired  with  observations  of  their  use.  this  architecture  can  produce  as  output,  a  language  model 
which  allows  the  utterances  to  explciin  those  observations.  The  bulk  of  this  thesis  is  an  elaboration 
on  this  process. 


a  language  comprehension  device.  Taking  an  observation  and  language  model  a.s  input  and  producing 
as  output  utterances  that  de.scribe  that  observation  constitutes  using  the  architecture  as  a  language 
generation  device.  Taking  an  utterance  paired  with  an  observation  a,s  input  and  producing  as  output 
a  language  model  which  allows  the  utterance  to  have  an  interpretation  consistent  with  the  observation 
constitutes  using  the  architecture  as  a  language  acquisition  device.  The  first  two  applications  of  this 
architecture  are  conventional  and  well-known.  The  third  application,  language  actjuisition.  is  the  novel 
application  considered  by  this  thesis. 

Part  1  of  this  thesis  addresses  the  two  leftmost  modules  of  the  architecture  from  figure  1.1.  namely 
the  parser  and  linker.  It  presents  a  theory,  implemented  in  three  different  computer  programs,  for 
deriving  a  language  model  from  utterances  paired  with  semantic  structures  denoting  their  meaning. 
Part  II  of  this  thesis  address  the  third  module  from  figure  1.1,  namely  perception.  It  presents  a  theory, 
again  implemented  as  a  computer  program,  for  deriving  semantic  structures  which  describe  the  events 
observed  in  visual  input.  As  stated  before,  the  long-term  goal  of  this  research  is  to  tie  these  two 
components  together.  Currently  however,  the  two  halves  of  this  thesis  are  formulated  using  incompatible 
representations  of  semantic  structure.  This  is  due  primarily  to  the  preliminary  nature  of  this  work.  The 
work  on  language  acquisition  predates  the  work  on  event  perception  and  was  formulated  around  a 
semantic  representation  which  later  proved  inadequate  for  grounding  language  in  perception.  While  in 
its  details,  the  techniques  presented  in  part  I  of  this  thesis  depend  on  the  old  representation,  prev  enting 
the  joint  operation  of  the  two  programs,  at  a  more  general  level  the  techniques  transcend  the  particular 
representations  u.sed.  This,  combined  with  the  fact  that  the  semantic  representation  used  in  part  1  of 
this  thesis  is  still  widely  accepted  in  the  linguistic  community,  precludes  ob-solescence  of  the  material 
pre.sented  in  part  I. 

As  part  of  learning  their  native  language,  children  must  learn  at  least  three  types  of  information: 
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.VI  I J-to-category  tiiappiiigs,  \vord-U>-meainiig  mappings,  and  langiiagf-s|»‘rifir  syntactic  information. 
( ’ollectively.  tins  information  is  takrMi  tcj  constitnttv  a  language  model.  Part  I  of  this  thesis  discusses 
techniques  for  learning  a  language  model  given  utterances  |)aired  with  semantic  strnctiires  denoting 
their  meaning.  The  language  model  can  f>e  seen  a  set  of  propositions,  each  denoting  some  linguistic  fact 
|)articular  to  the  language  being  learned.  For  example,  the  language  model  for  linglish  might  contain 
the  propositions  'tablt  is  a  noun  .  'labU  means  table*’,  and  prepositions  prect'de  their  comph'inents 

Acquisition  of  the  language  model  might  proceed  in  stages.  1  he  process  of  learning  ni‘w  propositions 
might  he  aided  by  propositions  already  acipiired  in  previous  stages.  To  avoid  infinite  regress,  however,  the 
process  must  ultimately  start  with  an  em|)ty  language  model  containing  no  language-s|)ecific  information. 
The  task  of  learning  a  language  model  with  no  prior  language-specific  information  has  become  known  a 
language  bootstrapping.  The  models  explored  in  part  1  of  this  thesis  address  language  bootstrapping. 

The  language  bootstrapping  task  is  illu.strated  by  the  following  small  example.  Let  us  a.ssume 
that  the  leartier  hears  the  utterance  .John  walhd  to  school.  In  addition,  let  us  a.ssuiiie  that  the 
learner  can  discern  the  meaning  of  that  utterance  from  its  non-linguist ic  context.  Furthermore,  let 
us  take  VVALK(John.TO(school))  to  be  the  repre.sentation  of  that  meaning.  The  learner  would  at¬ 
tempt  to  form  an  analysis  of  this  input  which  wa.s  consistent  with  her  model  of  universal  grammar.  For 
instance,  the  learner  might  postulate  the  following  analysis. 
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If  the  learner  could  determine  that  this  analysis  was  correct,  she  could  add  a  number  of  propositions 
to  her  language  model,  including  '.John  is  a  noun’.  'John  means  John',  and  "prepositions  precede  their 
complements'.  Unfortunately,  the  following  analysis  might  also  be  consistent  with  the  learner's  model 
of  universal  grammar. 


'Throughout  tfiis  thesis,  words  in  italics  denote  fingtiistic  tokens  while  words  in  boldface  or  ITPEH  (".■ySE  denote 
semantic  representations  of  word  meanings.  Furthermore,  there  is  no  prior  correspondence  between  a  lingtiistic  token  such 
as  tahlf  and  a  semantic  token  such  a.s  table,  even  though  they  share  the  same  spelling.  1  hey  are  treated  as  unittterpreted 
tokens.  The  task  faced  by  the  learner  is  to  acquire  the  appropriate  correspondences  as  word-tfv-meanitig  tnappings. 
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If  the  learner  adopted  this  analysis,  she  would  incorrectly  augment  her  language  model  wit  h  the  propo¬ 
sitions  'John  is  a  verb’,  'John  means  \V’ALK(j’,.v)',  and  "prepositions  follow  their  complements'.  During 
later  stages  of  language  acquisition,  the  partial  language  model  might  aid  the  learner  in  filtering  out 
incorrect  analyses.  Such  assistance  in  not  available  during  language  bootstrapping  however. 

Many  competing  theories  of  language  acquisition  (cf.  Pinker  1984  and  Lightfoot  1991)  address  this 
problem  by  suggesting  that  the  learner  employs  a  conservative  trigger-based  strategy  whereby  she  aug¬ 
ments  her  language  model  with  only  those  propositions  that  are  uniquely  determined  given  her  current 
language  model  and  the  current  input  utterance  taken  in  isolation.  In  the  above  situation.  I  rigger- based 
strategies  would  not  make  any  inferences  about  the  language  being  learned  since  such  inferences  could 
not  uniquely  determine  any  language-specific  facts.  Trigger-based  strategies  have  difficulty  explaining 
language  bootstrapping  due  to  the  rarity  of  situations  where  an  input  utterance  has  a  single  analysis 
given  a  sparse  language  model. 

This  thesis  adopts  an  alternative  cross-situational  learning  strategy  to  account  for  language  boot¬ 
strapping.  Under  this  strategy,  the  learner  attempts  to  find  a  language  model  which  is  consistent  across 
multiple  utterances.  Each  utterance  taken  in  isolation  might  admit  multiple  analyses  while  the  collection 
of  several  utterances  might  allow  only  a  single  consistent  analysis.  This  allows  the  learner  to  acquire 
partial  knowledge  from  ambiguous  situations  and  combine  such  partial  knowledge  across  situations  to 
infer  a  unique  language  model  despite  the  ambiguity  in  the  individual  isolated  situations.  For  example, 
the  learner  could  rule  out  the  second  analysis  given  above  upon  hearing  the  utterance  Mari/  walked  to 
school  paired  with  WALK(Mary,TO(school))  since  this  utterance  does  not  admit  an  analysis  which 
takes  school  to  mean  John.  This  cross-situational  approach  thus  also  alleviates  the  need  to  assume  prior 
knowledge,  since  all  such  knowledge  can  be  acquired  simultaneously  by  the  same  mechanism.  A  naive 
implementation  of  cross-situational  learning  would  require  the  learner  to  remember  prior  utterances  to 
make  a  collection  of  utterances  available  to  cross-situational  analysis.  Such  an  approach  would  not  be 
cognitively  plausible.  Part  I  of  this  thesis  explores  a  number  of  techniques  for  performing  cross-situational 
learning  without  keeping  track  of  prior  utterances. 

Let  me  elaborate  a  bit  on  my  use  of  the  term  cross-situational.  While  learning  language,  children  are 
exposed  to  a  continual  stream  of  situations  where  they  hear  utterances  in  their  non-linguistic  context . 
Intuitively,  the  term  cross-situational  describes  a  strategy  whereby  the  learner  acquires  language  by 
analyzing  multiple  situations.  Clearly,  a  child  cannot  learn  her  entire  native  language  from  a  single  pair 
of  linguistic  and  non-linguistic  observations.  Thus  in  a  trivial  sense,  all  learning  strategies  are  cross- 
situational.  This  thesis  however,  uses  the  term  to  describe  a  very  particular  strategy,  one  whereby  the 


1 


learner  fiiuls  a  single  language  inotlel  which  can  consistently  account  for  all  of  the  ohservej  situaiion>. 
A  language  model  must  meet  two  criteria  to  account  for  an  observed  situation.  First,  it  must  alh)w 
the  utterances  heard  in  that  situation  to  he  syntactically  well-formed.  Second,  it  must  allow  those 
utterances  to  he  semantically  true  and  relevant  to  their  non-linguist ic  context.  Thus  using  this  strategy, 
the  learner  applies  all  i)OSsible  syntactic  and  semantic  constraints  across  all  r)f  the  observed  situations 
to  the  language  acquisition  task.  This  stralt'gy  is  described  in  greater  detail  in  cha|)ter  d  whert'  it  is 
called  .strong  cross-situational  learning.  This  strategy  rlates  back  at  least  to  Chomsky  (Ihtj  j).  Fhis  thesis 
renders  more  precision  to  this  strategy  and  tests  it  on  several  concrete  linguistic  theories. 

It  is  instructive  to  contrast  this  strategy  with  a  number  of  alternatives.  Cold  (IHtiT)  describes  a 
strategy  whereby  the  learner  enumerates  the  possible  language  models  {L\.  L->.  ■  ■  ■}.  first  adopting  the 
language  model  L\  and  subsequently  switching  to  the  next  language  model  in  the  serpience  when  the 
current  language  model  cannot  account  for  the  current  observation.  Hamburger  and  Wexler  (IDTo) 
describe  a  variant  of  this  strategy  where  learner  does  not  try  the  alternative  language  models  in  any 
particular  enumerated  order  but  rather  switches  to  a  new  language  model  at  random  when  the  current 
language  model  fails  to  account  for  the  observation.  The  new  language  model  is  restricted  to  be  related 
to  the  previous  language  model  by  a  small  number  of  change  operators.  These  strategies  are  weaker 
than  strong  cross-situational  learning  since  when  the  learner  switches  to  a  new  language  model  that 
is  consistent  with  the  current  observation,  she  does  not  check  that  it  is  also  consistent  with  all  prior 
observations. 

The  strategy  adopted  by  Gold  does  not  impart  any  structure  on  the  language  model.  It  is  often 
natural,  however,  to  view  the  language  model  as  comprising  attribute-value  ]>airs.  Such  i>airs  may  repre¬ 
sent  word-to-category  mappings,  word-to-meaning  mappings,  or  values  of  syntactic  parameters.  Another 
common  learning  strategy  is  to  form  the  set  of  alternate  values  for  each  attribute  that  are  consist'^nt  with 
each  utterance  as  it  is  processed  and  intersect  those  sets.  The  value  of  an  attribute  is  determined  when  a 
singleton  set  remains.  Pinker  ( 1987a)  adopts  this  strategy  to  describe  the  acquisition  of  word-to-meaning 
mappings.  More  generally  it  can  be  used  to  learn  any  information  represented  as  attributi'-value  |)airs, 
including  word-to-category  mappings  and  syntactic  parameter  settings.  (.'hai)ter  3  refers  to  this  stratt'gy 
as  weak  cross-situational  learning  and  demonstrates  that  it  is  weaker  than  strong  cross-situational  learn¬ 
ing.  This  reduction  in  power  can  be  explained  simply  as  follows.  Consider  a  language  model  with  two 
attributes  oi  and  a->  each  having  two  possible  values  tq  and  v->-  Nominally,  this  would  allow  four  distinct 
language  models.  It  may  be  the  case  that  setting  «i  to  iq  is  mutually  inconsistent  with  setting  a-,  to  v>. 
even  though  all  three  remaining  possible  language  models  are  consistent.  It  is  impossible  to  rei)resent 
such  information  using  only  sets  of  po.ssible  attribute  values  since  in  this  case,  there  exists  some  language 
model  consistent  with  each  attribute-value  pair  in  isolation.  Thus  weak  cross-situational  learning  may 
fail  to  rule  out  some  inconsistent  language  models  which  would  be  ruled  out  by  strong  cross-sitnational 
learning. 

Strong  cross-situational  learning  is  a  powerful  but  computationally  expensive  techniipie.  Some  of 
the  implementations  discussed  in  chapter  4  do  use  full  strong  cross-situational  learning.  For  rea.sons  of 
computational  efficiency,  however,  some  of  the  implementations,  use  weaker  strategies.  These'  weaker 
strategies  differ  from  both  weak  cro.ss-situational  learning  and  the  enumeration  strategies  described 
above.  They  will  be  described  in  detail  in  chapter  4. 

The  actual  language  learning  task  faced  by  children  is  somewhat  more  complex  than  the  task  por¬ 
trayed  by  the  example  described  earlier.  That  example  assumed  that  the  learner  could  determine  the 
correct  meaning  of  an  utterance  from  context  and  simply  needed  to  associate  parts  of  that  meaning  with 
the  appropriate  words  in  the  utterance.  It  is  likely  however,  that  children  face  referential  uncertainty 
during  language  learning,  situations  where  the  meaning  of  an  utterance  is  uncertain.  They  might  be 
able  to  postulate  several  possible  meanings  consistent  with  the  non-linguistic  context  of  an  utterance 
but  might  not  be  sure  which  of  these  possible  meanings  is  the  correct  meaning  of  the  utterance.  Un¬ 
like  trigger-based  .strategies,  cross-situational  learning  techniques  can  learn  in  the  presence  of  referential 
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uncertainty. 

Part  I  of  this  thesis  applies  a  cross-situational  learning  strategy  to  the  task  of  learning  a  language 
niotlel  comprising  word-tocategory  mappings,  word-tomeaning  mappings,  and  language-specilic  com¬ 
ponents  of  grammar.  without  access  to  prior  language-specific  knowledge,  given  utterance-meaning  pairs 
which  exhihit  referential  uncertainty.  This  strategy  has  been  imjilemented  in  a  series  of  computer  pro¬ 
grams  which  test  this  strategy  on  linguistic  theories  of  successively  greater  so|)histication.  In  accord 
with  current  hypotheses  about  child  language  acquisition,  these  systems  use  only  |iositive  examples  to 
drive  their  accjuisition  of  a  language  model.  The  operation  of  D.avr.v  is  typical  of  these  programs. 
Figure  1.2  illustrates  a  sample  corpus  presented  an  input  to  D.an  ra.  .Note  that  this  corpus  exhibits 
referential  uncertainty  in  that  each  utterance  is  paired  with  several  possible  meanings  for  that  utter¬ 
ance.  Given  this  corpus,  Davra  can  derive  the  language  model  illustrated  in  figure  1.3.  Da\'RA  learns 
that  English  is  head-initial  and  SPEC’-initial.  Furthermore,  Da\'RA  learns  unique  word-to-category  and 
word-to-meaning  mapiiings  for  most  of  the  words  in  the  corpus. 

Part  I  of  this  thesis  discusses  three  language  acquisition  programs  which  incorporate  cross-situational 
learning  techniques.  Maimra.  the  first  program  developed,  learns  word-to-meaning  and  word-to- 
category  mappings  from  a  corpus  pairing  utterances  with  sets  of  expre.ssions  representing  the  potential 
meanings  of  those  utterances  hypothesized  by  the  learner  from  the  non-linguistic  context,  .Maimra's 
syntactic  theory  is  embodied  in  a  fixed  context-free  grammar.  Davra.  the  secoiul  program  developed, 
extends  Maimra  by  replacing  the  context-free  grammar  with  a  parameterized  variant  of  X  theory.  Given 
the  same  corpus  as  Maimra,  Davra  learns  the  parameter  settings  for  X  theory  in  addition  to  a  lexicon 
mapping  words  to  their  syntactic  category  and  meaning.  Davra  has  been  successfully  applied,  without 
change,  to  tiny  corpora  in  both  English  and  Japanese,  learning  the  requisite  lexica  and  parameter  settings 
despite  differences  in  word  order  between  the  two  languages.  Kenl'NIA.  the  third  program  developed, 
incorporates  a  more  comprehensive  model  of  universal  grammar  supporting  movement,  adjunction,  and 
empty  categories,  as  well  as  more  extensive  parameterization  of  its  X  theory  component.  This  model  of 
universal  grammar  is  based  on  recent  linguistic  theory  and  includes  such  notions  as  the  DP  hypothesis. 
VT-internal  subjects,  and  V'-to-I  movement.  Kenunia  is  able  to  learn  the  parameter  settings  of  this 
model,  as  well  as  word-to-category  mappings,  in  the  presence  of  movement  and  empty  categories.  All  of 
these  programs  strive  to  model  language  bootstrapping,  with  little  or  no  access  to  prior  language-specific 
knowledge,  in  the  presence  of  referential  uncertainty.  Chapter  4  will  present,  in  detail,  the  algorithms 
underlying  Maimra,  Davra,  and  Kenunia  along  wdth  annotated  examples  depicting  their  operation 
on  sample  learning  tasks. 

Part  II  of  this  thesis  addresses  the  task  of  grounding  semantic  representations  in  visual  perception. 
In  doing  so  it  asks  three  questions,  offering  novel  answers  to  each.  The  first  (piestion  is;  Mhaf  is  an 
appropriate  semantic  representation  that  can  allow  language  to  be  grounded  in  perception?  Chapter  7 
advances  the  claim  that  an  appropriate  semantic  representation  for  the  me.Tuings  of  simple  spatial 
motion  verbs  such  as  throw,  pick  up,  put.  and  walk  must  incorporate  the  notion-  f  support,  contact,  and 
attachment  as  these  notions  play  a  central  role  in  differentiating  occurrences  of  events  described  by  those 
words  from  non-occurrences.  Prior  representations  of  verb  meaning  focussed  on  the  aspects  of  motion 
depicted  by  the  verb.  For  example.  Miller  (1972),  Schank  (1973).  Jackendoff  (1983).  and  Pinker  (1989) 
all  gloss  throw  roughly  as  ‘to  cause  an  object  to  move'.  This  mi.sses  two  crucial  components  of  throwing  - 
the  requirement  that  the  motion  be  caused  by  moving  one's  hand  while  grasping  the  object  (contact  and 
attachment)  and  the  requirement  that  the  resulting  motion  be  unsupported.  Chapter  7  presents  a  novel 
lexical  semantic  representation  ba.sed  on  the  notions  of  .support,  contact,  and  attachment,  and  uses  that 
representation  to  characterized  the  prototypical  events  described  by  numerous  spatial  motion  verbs. 

Given  that  support,  contact,  and  attachment  relations  play  a  central  role  in  defining  verb  meanings, 
a  natural  second  question  arises:  How  are  support,  contact,  and  attachment  relations  between  objects 
perceived?  Chapter  8  offers  an  answer  to  that  question;  counterfactual  simulation — imagining  the  short¬ 
term  future  of  a  potentially  modified  image  under  the  effects  of  gravity  and  other  physical  forces.  For 


BE(persoui,  ATipersoiig))  V  B E( person j.  AT( person-, ))V 
GO(person| ,  [path  ])  V  (iO(person, .  FR0.M(person-5))V 
GO(personj ,  TOlperson,))  V  GO( person,,  [path  F ROM ( persona ). T0( person, )]) 

John  rollfd. 

BF(person.,,  ATIpersong))  V  BF(person-,.  ATlperson,  ))V 
G0(person2.  [path  ])  V  CiO(person2.  FR0.M(person3))V 
G0(person2,  T0(  person, ))  V  GO(person2.  [path  F  RO  M(  person,, ).  TO(  per  son, )] ) 

Mary  rolled. 

BF(persou3,  AT( person, ))  V  BF(person3.  AT( person-, ))V 
G0(person3,  [path  ])  V  GO(person3.  FROM ( person,  ))V 
G0(person3,  T0( persona))  G0(person3,  [pa,h  FRO-M(person, ).  TO( persona)]) 

Bill  rolled. 

BF(object,,AT( person, ))  V  BF(object, .  AT(person-,))V 
GO(object,,  [path  ])  V  GO(object,,  FROMIperson,  ))V 
GO(object, ,  TO(person2))  V  GO(object, ,  [path  FROM(person, ).  T0(person2)]) 

The  cup  rolled. 

BF(person3,  AT(person, ))  V  BF(person3,  AT(person2))V 
G0(person3.  [path  ])  V  G0(person3,  FROM(person,  ))V 
G0(person3,  T0(person2))  V  G0(person3,  [path  FROM(person,  ).T0( person,)]) 
_ Bill  ran  to  Mary. _ 

BF(person3,  AT(person,))  V  BF(person3.  AT(person2))V 
G0(persou3,  [path  ])  V  GO(person3.  FROM(person,  ))V 
G0(person3, T0( persona))  V  G0(person3.  [path  FROM(person,).TO(person2)]) 

Bill  ran  from  John. 

BF(person3,  AT(person,))  V  BF(person3,  AT(object,  ))V 
G0(person3,  [path  ])  V  GO(person3.  FROM(person,  ))V 
GO(person3,  TO(object, ))  V  GO(person3,  [path  FROM(person, ),  TO(object, )]) 

Bill  ran  to  the  cup. 

BF(object, ,  AT(person, ))  V  BF(object, .  AT(person2))V 
GO(object,,  [path  ])  V  GO(object,,  FROM(person,  ))V 
GO(object,,TO(person2))  V  GO(object, ,  [pati,  FROM  (person, ),  TO(  persona)]) 

_ The  cup  slid  from  John  to  Mary. _ 

ORIENT(person, ,  TO(person2 )  )V 
O  RIF  NT{  persona  ■  TO(  person3 ) )  V 
ORIENT(person3,TO(person, )) 

John  faced  Mary. 


Figure  1.2:  A  sample  corpus  presented  to  Davra.  The  corpus  exhibits  referential  uncertainty 
that  each  utterance  is  paired  with  several  possible  meaning  expressions.  Davra  is  not  told  which 
the  correct  meaning,  only  that  one  of  the  meanings  is  correct. 


18 


CHAFTER  1.  OVERVIEW 


Head  Initial.  SPEC  Initial. 


■John: 

[N] 

.Mary: 

[N] 

BiU: 

[N] 

cup: 

[N] 

the: 

[Nspec] 

rolled: 

[V] 

ran: 

[V] 

.slid: 

[V] 

faced: 

[V] 

from: 

[N.V.P] 

to: 

[N.V.P] 

person  j 
person-, 
persona 
object  1 
1 

(10(J-.  [patl,  ]) 

CO(j-.y) 

C10(J*.  [path  .!/•  -]) 

ORIENT(r.TO(.i/)) 

FROM(r) 

TO(r) 


Figure  1.3;  The  language  model  inferred  by  Davrafor  the  corpus  from  figure  1.3.  Note  that  Davra 
has  converged  to  a  unique  word-to-meaning  mapping  for  each  word  in  the  corpus,  as  well  as  a  unique 
word-lo-category  mapping  for  all  but  two  words. 


instance,  one  determines  that  an  object  is  unsupported  if  one  imagines  it  falling.  Likewise,  one  determines 
that  an  object  A  supports  an  object  B  if  B  is  supported  but  falls  when  one  imagines  a  world  without  .d. 
An  object  .4  is  attached  to  another  object  B  if  one  must  hypothesize  such  an  attachment  to  explain  the 
fact  that  one  object  supports  the  other.  Likewise,  two  objects  must  be  in  contact  if  one  supports  the 
other. 

Counterfactual  simulation  relies  on  a  modular  imagination  capacity.  This  capacity  takes  the  rep¬ 
resentation  of  a  possibly  modified  image  as  input  and  predicts  the  short-term  consequences  of  such 
modifications,  determining  whether  some  predicate  P  holds  in  any  of  the  series  of  images  depicting  the 
short-term  future.  The  imagination  capacity  is  modular  in  the  sense  that  the  same  unaltered  mechanism 
is  used  for  a  variety  of  purposes,  varying  only  the  predicate  P  and  the  initial  image  model  between 
calls.  This  leads  to  the  third  question;  How  does  the  tmagtnaiton  capacity  operate?  Nominally,  the 
imagination  capacity  can  be  though  of  as  a  kinematic  simulator.  To  predict  the  future,  this  simulator 
would  embody  physical  knowledge  of  how  objects  behave  under  the  influence  of  physical  forces  such 
cis  gravity.  Traditional  approaches  to  kinematic  simulation  take  physical  accuracy  and  the  ability  to 
simulate  mechanisms  of  arbitrary  complexity  to  be  primary.  They  typically  operate  by  integrating  the 
aggregate  forces  on  objects,  relegating  collision  detection  to  a  process  of  secondary  importance. 

Human  perception  appears  to  be  based  on  different  principles  however.  These  include  the  following. 

substantiality:  Solid  objects  don’t  pass  through  one  another. 

continuity:  Objects  follow  continuous  paths  when  moving  from  one  location  to  another.  They  don't 
disappear  and  reappear  elsewhere  later. 

gravity:  Unsupported  objects  fall. 

ground  plane:  The  ground  acts  as  universal  support  for  all  objects. 

These  principles  are  perveisive.  It  is  hard  to  imagine  situations  that  violate  these  principles.  Traditional 
kinematic  simulation,  however,  violates  some  of  these  principles  as  a  matter  of  course.  Numerical 
integration  violates  continuity.  Performing  collision  detection  exogenous  to  numerical  integration  will 
admit  substantiality  violations  up  to  the  tolerance  allowed  by  the  integration  step  size.  Thus  traditional 


approaches  to  kinematic  simulation  do  not  appear  to  l»e  approjiriate  Ibiimlations  for  a  model  of  tlie 
human  imagination  capacity. 

Chapter  9  advances  the  claim  that  the  imagination  capacity  used  for  couiiterfactual  simulation  and 
event  perception  is  organized  along  very  differenl  lines  than  traditional  kinematic  simulators.  It  directly 
encodes  the  principles  of  suhstantiality,  continuity,  gravity,  and  ground  plane.  It  takes  collision  detection 
to  be  primary  and  physical  accuracy  to  be  secotidary.  In  <loing  so  it  must  forego  the  ability  to  sitinilate 
mechanisms  of  arbitrary  complexity.  The  reason  for  this  shift  in  priorities  is  that  collision  detection  is 
more  important  than  physical  accuracy  iti  determining  support,  contact,  and  attachmetit  relations. 

Chapters  8  and  9  review  some  experiments  reported  by  Freyd  et  al.  ( 1988).  Baillaigeon  ('t  al.  ( 19?'')). 
Baillargeon  (198(1,  1987)  and  Spelke  (1988)  which  support  the  claims  made  in  |)art  II  of  this  thesis,  .As 
additional  evidence,  a  simplified  version  of  this  theory  has  been  implemented  as  a  working  cotnputer 
program  called  Abigail.  Abigail  watches  a  computer-generated  animated  movie  depicting  objects 
participating  in  various  events.  Figure  1.4  illu.strates  selected  frames  from  a  .sample  movie  shown  to 
Abigail.  The  images  in  this  movie  are  constructed  out  of  line  segments  and  circles.  Fhe  input  to 
Abigail  consists  solely  of  the  positions,  orientations.  shai>es,  and  sizes  of  these  line  segments  and 
circles  during  each  frame  of  the  movie.  Abigail  is  not  told  which  collections  of  line  segments  and  circles 
constitute  objects.  By  applying  the  technitpies  described  above,  she  must  segment  the  image  into  objects 
and  determine  the  support,  contact,  and  attachment  relations  between  these  objects  as  a  foundation  for 
producing  semantic  descriptions  of  the  events  in  which  these  objects  |>articipate.  F'or  exami)l<'.  .Abicjaii. 
can  determine  that  the  man  is  unsupported  in  frame  II  of  the  movie  by  imagining  him  falling,  as  depicted 
in  figure  l.o. 

The  remainder  of  this  thesis  is  divided  into  two  parts  comprising  nine  chapters.  (’hai>ters  2  through  5 
constitute  part  I  which  discusses  language  acquisition.  Chapter  2  introduces  part  1  by  defining  the 
bootstrapping  problem  and  giving  an  overview  of  the  cross-situational  techniques  used  to  address  that 
problem.  Chapter illustrates  the  power  cros.s-situational  learning  has  over  trigger- based  a/)proaches  by 
demonstrating  several  small  examples,  completely  worked  through  by  hand,  where  cross-situational  tech¬ 
niques  allow  the  learner  to  converge  on  a  unique  language  model  for  a  set  of  utterances  even  though  each 
utterance  in  isolation  admits  multiple  analyses.  Chapter  4  pre.sents  a  detailed  discussion  of  Maimra. 
Davra,  and  Kenunia — three  implemented  computer  models  of  language  acquisition  which  incorporat(' 
cross-situational  techniques.  Chapter  5  concludes  part  1  by  reviewing  related  work  on  language  acqui¬ 
sition  and  suggesting  continued  work  for  the  future.  Chapters  (3  through  10  constitute  part  II  which 
addresses  the  grounding  of  language  in  perception.  C’hapter  C  introduces  part  11  by  describing  the  event 
perception  task  faced  by  ABIGAIL.  Chapter  7  presents  a  novel  lexical  semantic  representation  centered 
around  the  notions  of  support,  contact,  and  attachment,  giving  definitions  in  this  representation  for 
numerous  simple  spatial  motion  verbs.  Chapter  8  discusses  the  event  perception  mechanisms  used  by 
Abigail  to  segment  images  into  objects  and  to  recover  the  changing  support,  contact,  and  attachment 
relations  between  those  objects.  Chapter  9  discusses  Abigail's  imagination  capacity  in  detail,  showing 
how  the  imagination  capacity  explicitly  encodes  the  naive  physical  constraints  of  substantiality,  con¬ 
tinuity.  gravity,  and  ground  plane.  Chapter  10  concludes  part  II  by  reviewing  related  work  on  event 
perception  and  suggesting  continued  work  for  the  future. 


Figure  1.5:  The  sequence  of  images  produced  by  Abigail  while  imagining  the  short-term  future  of 
frame  11  from  the  movie  described  in  figure  1.4. 
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Language  Acquisition 
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Chapter  2 

Introduction 


We  can  all  agree  that  as  part  of  the  process  of  actpiiring  their  native  language,  children  niust  learn  at 
least  three  things:  the  syntactic  categories  of  words,  their  meanings,  and  the  languag»‘-sp<'cific  compo¬ 
nents  of  synta,^,  .Such  knowledge  constitutes,  at  least  in  part,  the  language-specific  linguistic  knowledge 
which  children  must  acquire  to  beconu  fluent  speakers  of  their  native  language.  Initially,  children  lark 
any  such  language-specific  knowledge.  Yet  they  come  to  acquire  that  knowledge  through  the  language 
acquisition  process.  Part  1  of  this  thesis  attempts  to  answer  the  following  question:  Whal  proadun  vnglif 
childnn  employ  to  Uam  ihetr  native  language,  without  any  acre.s.s  to  previously  acquired  language -specific 
knowledge'!’ 

This  question  is  not  new  nor  is  this  the  first  attempt  at  providing  an  answer.  The  account  offeretl  in 
this  thesis,  however,  differs  from  prior  accounts  in  a  number  of  ways.  These  differences  are  summarized 
by  three  issues  highlighted  in  the  question  s  formulation. 

procedure:  This  thesis  seeks  a  procedural  description  of  the  language  acquisition  process.  To  be  an 
adequate  description,  the  procedure  must  be  be  shown  to  work.  Ideally,  one  must  demonstrate  that 
it  is  capable  of  acquiring  language  given  the  same  input  that  is  available  to  children.  Pinker  ( 1979) 
calls  this  the  Hdelity  criterion.  .Such  tlemoiistration  requires  that  the  procedure  be  precisely  spec¬ 
ified.  Impreci.se  procedural  specifications,  typical  of  much  prior  work  on  language  acquisition  in 
cognitive  science,’  admit  only  speculative  evidence  that  such  procedures  do  actually  work  and 
are  therefore  an  inadequate  account  of  the  language  acquisition  process.  Ultimately,  the  most 
satisfying  account  would  be  a  procedural  specification  which  is  precise  enough  so  that,  at  least  in 
principle,  it  could  be  implemented  cis  a  computer  program.  This  thesis  presents  three  different  pre¬ 
cise  procedures,  each  implemented  as  a  working  computer  program  which  successfully  solves  very 
small  language  acquisition  tasks.  The  input  to  these  programs  approximates  the  input  available 
to  children. 

might:  An  ultimate  account  of  child  language  acquisition  would  demonstrate  not  only  a  working  lan¬ 
guage  acquisition  procedure  but  also  evidence  that  that  procedure  was  the  one  actually  used  by 
children.  This  thesis  demonstrates  only  that  certain  procedures  work.  It  makes  no  claim  that 
children  utilize  these  procedures.  Clearly,  it  makes  sense  to  suggest  that  children  employ  a  given 
procedure  only  once  one  knows  that  the  procedure  works.  Doing  otherwise  would  be  putting  the 
cart  before  the  horse.  This  thesis  views  the  task  of  proposing  working  procedures,  irrespective  of 
whether  children  employ  these  procedures,  as  the  first  step  toward  the  ultimate  goal  of  determining 
the  procedures  utilized  by  children. 

'Notable  exceptions  to  imprecise  procedural  specifications  include  the  work  of  Hamburger  and  Wexler  (lOT.'i)  and 
Berwick  (1979.  1982). 


■2()  CHAPTEH  2.  ISTRODECTIOS 

without  ally  prior  access  to  previously  acquired  lauguage-specific  knowledge:  To  he  a  coiii- 
plt'te  account,  a  language  ac({uisition  procedure  must  not  rely  on  jyreviously  ac<|uire(l  language- 
specific  knowledge.  Doing  so  oidy  reduces  one  prohleni  to  another  un.solved  luohleni.  I'lie  prohlem 
of  how  children  begin  the  task  of  language  acquisition,  without  any  prior  language-specific  knowl¬ 
edge,  has  become  known  as  the  hoot. strapping  problem.  Most  previous  accounts  assume  that 
children  possess  some  language-specific  knowledge,  such  as  the  meanings  or  syntactic  categories 
of  nouns,  before  beginning  to  acquire  the  remaining  language-specific  information.  Since  these 
accounts  do  not  present  methods  for  acquiring  such  preliminary  language-six-citic  knowledge,  they 
at  worst  suffer  from  problems  of  infinite  regress.  At  best  they  describe  only  {)art  of  the  language 
acquisition  process.  While  it  may  be  the  case  that  the  language  acquisition  ])rocedure  employed 
by  children  is  indeed  a  staged  process,  to  date  no  one  ha.s  given  a  complete  account  of  that  en¬ 
tire  process.  In  contrast,  the  goal  of  this  research  program  is  to  propose  algorithms  which  do 
not  rely  on  any  prior  language-specific  knowledge.  Significant  jirogress  has  been  maile  toward 
this  goal.  Chapter  4  presents  three  implemented  language  acquisition  models.  In  accord  with 
current  hypotheses  about  child  language  acquisition,  these  systems  use  only  positive  e.xamples  to 
drive  their  acquisition  of  a  language  model.  The  first  learns  both  word-to-category  and  word-to- 
meaning  mappings  given  prior  access  only  to  grammar.  The  second  learns  both  word-to-category 
and  word-to-meaning  mappings,  as  well  as  the  grammar.  The  third  learns  word-to-category  map¬ 
pings  along  with  the  grammar,  given  prior  access  only  to  word-to-meaning  mappings.  All  of  these 
models,  however,  assume  prior  access  to  the  phonological  an<l  morphological  knowledge  needed  to 
acoustically  segment  an  utterance  into  words  and  recognize  those  words. 

Part  1  of  this  thesis  focuses  solely  on  language  bootstrapping.  The  remainder  of  this  chapter 
describes  the  bootstrapping  problem  in  greater  detail.  It  makes  precise  some  assumptions  this  thesis 
makes  about  the  nature  of  the  input  to  the  language  acquisition  device,  as  well  as  the  language-specific 
knowledge  to  be  learned.  Some  competing  theories  about  language  acquisition  share  a  common  learning 
strategy:  they  attempt  to  glean  linguistic  facts  from  i$olatfd  obs( nations.  1  call  this  strategy  trigger- 
based  learning.  This  thesis  advocates  an  alternative  strategy,  cross-situational  learning,  and  suggests 
that  it  may  offer  a  better  account  of  child  language  acquisition. 

2.1  The  Bootstrapping  Problem 

The  task  of  modeling  child  language  acquisition  is  overwhelmingly  complex.  Given  our  current  lack  of 
understanding,  along  with  the  immensity  of  the  task,  any  proposetl  procedure  will  necessarily  address 
only  an  idealization  of  the  task  actually  faced  by  children.  Any  idealization  will  make  assumptions  about 
the  nature  of  the  input  to  the  language  acquisition  device.  Furthermore,  any  idealization  will  address 
only  a  portion  of  the  complete  language  acquisition  task,  and  consider  the  remainder  to  be  external 
to  that  task.  Before  presenting  the  language  acquisition  procedures  that  I  have  developed.  I  will  first 
delineate  the  idealized  problem  which  they  attempt  to  solve. 

I  assume  that  the  input  to  the  language  acquisition  device  contains  both  linguistic  and  non-linguistic 
information.  It  seems  clear  that  the  input  must  contain  linguistic  information.  Assuming  that  the  ini>ut 
contains  non-linguistic  information  deserves  some  further  discussion.  Practically  everyone  will  agree 
that  non-linguistic  information  is  required  for  learning  the  meaning  of  words.  As  Fisher  et  al.  (1991) 
aptly  state:  “You  can't  learn  a  language  simply  by  listening  to  the  radio".  It  is  not  clear  however, 
that  non-lingui.stic  information  is  required  for  learning  syntax.  The  tacit  a.ssumption  behind  the  en¬ 
tire  field  of  formal  learning  theory  (cf.  Gold  1967  and  Blum  and  Blum  1975)  is  that  a  learner  can 
learn  syntax,  or  at  least  the  ability  to  make  grammaticality  judgments,  by  observing  linguistic  infor¬ 
mation  alone.  It  might  be  the  case  that  this  is  feasible.  Furthermore,  both  Gleitman  (1990)  and 
Fisher  et  al.  (1991)  suggest  that,  at  least  in  part,  verb  meanings  are  constrained  by  their  subcategoriza- 
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tion  frames.  Brent  (ISWi),  U)9().  lUDla,  lyfMc)  sliows  liow  verl)  snhiategorizatioii  frames  l  aii  be 

(lerivetl  from  an  untagged  corpus  of  utterances  without  any  non- linguistic  information."  l  liough  neitlier 
(Jleitman,  Fislier  et  al..  nor  Brent  suggest  this,  it  is  conceivable  tliat  a  learner  could  |)otentially  learn 
all  of  syntax,  and  some  semantics,  through  ex|>osure  to  linguistic  information  alone.  W  hether  or  not 
children  do  so  is  an  open  question.  .Nonetheless,  tlie  proceilures  presented  in  this  thesis  utilize  both  lin¬ 
guistic  ami  non-linguist ic  information  in  the  process  of  inferring  both  syntactic  and  semantic  knowledge, 
as  is  in  fact  typical  of  most  other  work  in  the  fiehl. 

In  the  model  considered  here,  the  linguistic  input  to  the  language  acipiisition  device  is  a  symbolic 
token  stream  consisting  of  a  list  of  grammatical  utterances,  each  utterance  being  a  string  of  wortls. 
Since,  the  actual  linguistic  evidence  available  to  children  consists  of  an  acoustic  signal,  this  a.ssnmes 
that  children  have  the  capacity  for  segmenting  the  acoustic  stn'am  into  utterances  and  words,  as  W(dl 
as  classifying  different  occurrences  of  a  given  word  as  the  same  symbolic  token  despite  difference  in 
their  acoustic  waveform.  These  segmentation  and  classifications  procedures,  however,  are  likely  to  rely 
at  least  in  part  on  language-specific  information.  An  ultimate  account  of  language  acquisition  would 
have  to  explain  how  children  acquire  such  word  segmentation  and  classification  knowledge  along  with 
other  language-specific  knowledge.  For  pragmatic  rea.sons.  the  language  acquisition  procedures  proi)osed 
in  this  thesis,  like  most  other  proposed  procedures,  ignore  this  |)robiem  and  a.ssume  that  tin-  learner 
has  the  ability  to  preprocess  the  acoustic  input  to  provide  a  symbolic  token  stream  as  input  to  the 
language  acquisition  device.  Also,  like  most  other  proposed  procedures,  this  thesis  assumes  that  the 
symbolic  information  recovered  from  the  acoustic  input  comprises  only  word  and  utterance  boundary 
information  and  word  identity.  Gleitman  (1990)  and  Fisher  et  al.  (1991)  argue  that  children  can  also 
recover  information  about  syntactic  structure  from  the  pro.sodic  ])ortion  of  the  acoustic  signal  and  that 
they  utilize  such  information  to  aid  the  language  acquisition  process.  It  may  be  possible  to  extend  the 
strategies  discussed  in  this  thesis  to  use  such  pro.sodic  information  in  a  way  that  would  imjtrovt'  their 
performance.  Such  exploration  remains  for  future  work. 

The  general  learning  strategy  put  forth  in  this  thesis  is  one  of  cross-situational  learning.  This  strategy 
is  depicted  in  figures  2.1  and  2.2.  It  is  incorporated,  with  minor  variation,  in  all  three  of  the  implemented 
systems  discussed  in  chapter  4.  Figure  2.1  illustrates  a  general  language  processing  architecture.  This 
architecture  is  a  portion  of  the  more  complete  architecture  depicted  in  fi'  ure  1.1.  The  perception 
component  has  been  omitted  as  that  will  be  the  focus  of  part  II  of  this  thesis.  Part  I  of  this  thesis  instead 
focuses  on  the  remaining  two  processing  modules,  namely  the  parser  and  linker.  The.se  two  processing 
modules  relate  six  representations.  The  parser  takes  an  utterance  as  input  and  produces  syntactic 
structures  as  output.  The  parsing  process  uses  language-specific  syntactic  knowledge,  in  the  form  of  a 
grammar,  along  with  the  syntactic  categories  of  words  derived  from  the  lexicon.  Taken  together,  the 
grammar  and  lexicon  form  a  language  model.  The  linker  implements  compositional  semantics,  combining 
the  meanings  of  individual  words  in  the  utterance,  taken  from  the  lexicon,  and  producing  a  .semant  ic 
structure  representing  the  meaning  of  the  entire  utterance.  This  linking  process  is  mediated  by  the 
syntactic  structure  produced  by  the  parser. 

Traditionally,  the  architecture  in  figure  2.1  is  coii'-eived  of  as  being  a  directed  computing  device. 
As  a  language  comprehension  device,  it  receives  an  utterance,  a  grammar,  and  a  lexicon  as  in|Hit. 
and  produces  (perhaps  several  ambiguous)  semantic  structures  as  output.  The.se  .semantic  structures 
constitute  a  representation  of  the  meaning  of  the  input  utterance.  As  a  language  production  device, 
it  receives  a  communicative  goal  as  input,  in  the  form  of  a  semantic  structure,  along  with  a  grammar 
and  a  lexicon,  and  produces  (perhaps  several  possible)  utterances  as  output,  each  of  which  conveys  the 
semantic  content  of  the  desired  communicative  goal.  These  two  uses  of  this  architecture  are  conventional 
and  well-known.  Tiiis  thesis  explores  a  novel  third  possibility.  The  architecture  from  figure  2.1  can  be 
viewed  instead  as  a  declarative  relation  that  must  hold  between  an  utterance  ».  a  semantic  structure  .s.  a 

^His  tecliniqiiP  requires  a  small  amount  of  prior  language-specific  knowledge  in  the  form  of  a  lexicon  of  closed-class 
words  and  a  small  regular  (finite  state)  covering  grammar  for  English. 
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Figure  2.1:  A  generic  language  processing  architecture.  The  parser  takes  an  input  utterance,  along 
with  a  grammar  and  syntactic  category  information  from  the  lexicon,  and  produces  syntactic  struc¬ 
tures  as  output.  The  linker  then  forms  the  meaning  of  the  utterance,  i.e.  its  semantic  structure,  out 
of  the  meanings  of  its  constituent  words.  Word  meanings  are  taken  from  the  lexicon.  The  linking 
process  is  mediated  by  the  syntactic  structure  produced  by  the  parser  for  the  utterance. 
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Figure  2.2;  This  figures,  illustrates  liovv  the  geiierie  language  procesr.iiig  architecture  from  figure  2  1 
can  be  used  to  support  cross-situational  learning.  .A  copy  of  the  architecture  from  figure  2.1  is  made 
for  each  utterance-meaning  pair  in  the  corpus.  .All  of  these  copies  are  constrained  to  use  the  same 
language  model,  i.e.  the  same  grammar  and  lexicon.  The  learner  must  Hnd  a  language  model  which 
is  consistent  acro.ss  the  corpus. 


grammar  G.  and  a  lexicon  L.  I  will  denote  this  declarative  relation  via  the  predicate  U{G.  L,  u,  s).  Here, 
U  indicates  whatever  universal  linguistic  knowledge  is  presumed  to  be  innate  while  G  and  L  indicate 
language-specific  grammatical  and  lexical  knowledge  that  must  be  acquired.  This  architecture  can  be 
presented  with  an  input  utterance  u.  paired  with  a  semantic  structure  .s  representing  its  meaning.  The 
semantic  structure  s  corresponding  to  u  could  be  derived  by  observing  the  non-linguistic  context  of  the 
utterance  u.  The  predicate  U{G.  L.u.s)  then  constrains  the  set  of  possible  grammars  G  and  lexica  L 
that  are  consistent  with  the  assumption  that  the  input  utterance  ii  has  the  given  meaning  .s.  Thus  T 
can  be  used  in  this  fashion  as  a  language  acquisition  device. 
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A  single  utterance  i>aireti  witli  a  single  semantic  structure  is  usually  not  snlficient  to  uniquely  deter¬ 
mine  the  grammar  and  lexicon.  I'he  grammar  and  lexicon  can.  however,  be  uniciuely  determined  through 
crosfy-sit national  learning.  The  idea  behind  cross-situational  learning  is  depicted  in  figure  2.2.  Here,  the 
learner  is  presented  with  a  sequence  of  utterances,  each  paired  with  a  reitresentation  of  its  meaning. 
The  architecture  from  figure  2.1  is  replicated,  with  each  utterance-meaning  ])air  being  aj>j)lied  to  its  own 
copy  of  the  architecture.  The  different  copies  however,  are  constrained  to  share  the  same  gratnmar  and 
lexicon.  This  amounts  to  the  following  learning  strategy. 

Find  (/  and  L  such  that: 

r((l,  L.  U],  *1  )A 
('((/,  L,  u-i.  a-.))A 


t  {Cl ,  L,  lift .  s„  ) . 

The  above  learning  strategy  has  a  limitation  however.  It  requires  that  the  learner  utiambigiiously  know 
the  complete  and  correct  meatiing  of  each  input  utterance.  If  the  learner  was  mistaken  and  associated 
the  wrong  meaning  with  but  a  single  utterance,  this  architecture  either  will  produce  the  wrong  grammar 
and  lexicon  as  output,  or  will  not  be  able  to  find  any  grammar  and  lexicon  consistent  with  the  input 
data.  This  limitation  can  be  alleviated  somewhat  by  relaxing  the  ini)ut  requirement.  VVe  could  instead 
allow  the  learner  to  hypothesize  a  .set  of  possible  nteanings  for  each  utterance,  most  of  which  will  be 
incorrect.  So  long  as  the  correct  meaning  is  included  with  the  set  of  meanings  hypothesized  for  each 
input  utterance,  the  learner  could  still  determine  a  grammar  and  lexicon  using  the  following  extended 
strategy. 

Find  G  and  L  such  that: 

[('  (G.  L,  Ui ,  sii )  V  •  •  ■  V  f '  (Cl.  L,  Ml ,  »i,ii,  )]A 
[f '  ( fr ,  L,  Ut  .  Sji  )  V  •  •  ■  V  U{G.  L.  Uo .  *'2m  >  )]A 


[U(G.L,  u„,s„i)  V  •  •  •  V  U(G.  L,  m„. 

Here  the  learner  simply  knows  that  one  of  the  meanings  s,i . is  the  correct  meaning  for  utter¬ 

ance  Uj.  yet  need  not  know  which  is  actually  the  correct  one.  For  example,  a  child  hearing  the  utterance 
John  threw  the  ball  to  Mary  in  a  situation  where  John  threw  the  ball  to  Mary  while  walking  home  from 
school  might  conjecture  that  the  utterance  meant  that  John  and  Mary  were  playing,  that  Mar\  wanted 
the  ball,  that  John  and  Mary  were  walking,  or  a  myriad  of  other  po.ssible  mr.nnings  in  addition  to  the 
correct  one.  This  type  of  ambiguity  in  the  mapping  of  input  utterances  to  their  correct  meaning  will  be 
referred  to  as  referential  uncertainty.  The  process  of  determining  Ci  and  L  wdll.  in  retrospect,  eliminate 
the  referential  uncertainty  and  allow  the  learner  to  determine  the  correct  meanings  to  associate  with 
each  input  utterance. 

The  above  strategy  still  makes  some  residual  a.ssumptions  about  the  injiut  to  learner.  It  re<|uires 
that  each  of  the  input  utterances  be  grammatical  in  the  language  to  be  learned.  This  is  a  standard 
assumption  in  the  field  of  language  acquisition  modeling.  It  also  requires  that  the  learner  postulate 
the  correct  meaning  for  each  utterance  as  one  of  the  hypothesized  meanings  for  that  utterance.  The 
learner  would  fail  to  converge  to  the  correct  grammar  and  lexicon  if  either  of  these  requirements  are 
not  met.  Furthermore,  the  strategy  becomes  intractable  if  the  set  of  hypothesized  meanings  paired  with 
^.ich  input  utterance  grows  very  large.  Thus,  this  strategy  is  feasible  only  if  the  learner  possesses  some 
way  of  narrowing  the  .set  of  hypothesized  meanings  using  some  criteria  of  salience.  Potential  solutions 
to  these  issues  are  discussed  in  section  5.2. 
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Tlie  key  claim  made  in  tliis  thesis  is  tliat  an  appropriately  constraining  theory  of  universal  linguis¬ 
tic  knowledge,  comhined  with  a  large  corpus  of  utterances  pairetl  with  possible  meanings,  is  snfhcient 
to  uniquely  cletermine  a  language-specific  grammar  and  lexicon,  using  cross-sit  national  learning.  Ising 
cross-situational  learning,  there  is  no  problem  of  regress.  I'nlike  other  recent  proposals  (cf.  Pinker  198-1), 
this  strategy  makes  no  assumption  that  some  language-specific  knowledge  must  be  accpiired  by  unspec¬ 
ified  means  before  acquiring  other  language-specific  knowledge.^ 

Let  me  point  out  how  the  above  strategy  differs  from  the  traditional  folklore  account  of  language 
acquisition.  The  traditional  account  claims  that  children  learn  a  word  s  meaning  by  observing  situations 
depicting  its  use.  Presumably,  a  child  hears  the  word  ball  while  being  shown  a  ball  aiul  learns  to  |)air 
the  word  hall  with  the  coticept  ball.  For  the  traditional  approach  to  work,  the  child  must  be  able 
to  unambiguously  pair  a  word  with  its  concept.  This  requires  that  there  be  at  least  one  situation  to 
which  the  child  is  exposed  where  (a)  no  other  wor<ls  are  uttered  along  with  hall  while  in  the  presence 
of  balls,  and  (b)  no  other  objects  are  present  which  are  potential  referents  of  the  word  bull.  Otherwise, 
a  child  hearing  Pick  tip  ih(  ball  in  the  presence  of  a  ball  and  a  truck,  could  pair  pick  with  ball,  ball 
with  truck,  or  even  worse,  pick  with  truck.  While  undoubtedly,  most  children  are  exjtosetl  to  souk 
situations  where  a  single  word  is  uttered  in  the  context  of  a  single  salient  referent ,  it  seems  unlikely  that 
the  language  acquisition  device,  robust  as  it  is.  could  be  relying  on  this  strategy  given  the  fleetingly 
rare  possibilities  for  its  use.  The  cross-situational  strategy  outlined  in  this  thesis  does  not  make  such 
restrictive  assumptions  about  the  nature  of  the  input  to  the  language  acquisition  device. 

2.2  Outline 

The  remainder  of  part  I  of  this  thesis  is  divided  into  three  chapters.  Chapter  8  motivates  the  need  for 
cross-situational  learning  by  demonstrating  two  small  examples,  fully  worked  through  by  hand,  which 
illustrate  how  cross-situational  techniques  work  and  how  they  can  be  more  powerful  thati  alternate 
approaches.  Before  presenting  the  details  of  cross-situational  learning,  chapter  3  first  covers  some  pre¬ 
liminary  background  material.  It  discusses  a  particular  semantic  linking  rule,  namely  composition  by 
substitution,  and  how  to  apply  that  rule  in  reverse.  Inverse  linking,  which  I  call  fracturing,  plays  a  central 
role  in  cross-situational  semantic  learning.  Chapter  4  then  presents  three  implemented  systems  which 
apply  cross-situational  strategies  to  successively  more  .sophisticated  linguistic  t  heories  which  make  fewer 
and  fewer  assumptions  about  the  nature  of  the  linguistic  input  and  the  child's  prior  language-specific 
knowledge.  Chapter  5  compares  the  cross-situational  approach  to  several  competing  language  acquisition 
theories  which  do  not  use  cross-situational  techniques.  It  also  summarizes  the  claims  made  and  results 
reported  in  part  I  of  this  thesis,  discussing  current  limitations  and  areas  for  future  work. 


® Except  perhaps  the  language-specific  knowledge  needed  to  acoustically  segment  utterances  anti  recognize  worfis.  it 
may  be  possible  to  extend  the  cross-situational  learning  techniques  presented  in  this  thesis  to  simultaneously  acquire  such 
knowledge  as  well. 
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Chapter  3 


Cross-Situational  Learning 


Section  5.1  will  review  a  number  of  competing  approaches  to  language  bootstrapping.  Many  of  the 
approaches  reviewed  use  trigger-based  strategies.  Trigger-based  strategies  attempt  to  learn  linguistic 
facts  by  observing  isolated  utterances.  There  is  an  alternative  to  trigger-based  learning.  Rather  than 
attempting  to  glean  a  linguistic  fact  from  a  single  utterance  or  utterance-observation  pair,  it  is  possible  to 
try  to  find  those  linguistic  facts  that  are  consistent  across  multiple  utterances  and  utterance-observation 
pairs.  I  will  call  such  techniques  cross-situational  learning.  These  techniques  allows  the  learner  to  acquire 
partial  knowledge  from  ambiguous  situations  and  combine  such  partial  knowledge  across  situations  to 
infer  a  unique  language  model  despite  the  ambiguity  in  the  individual  isolated  situations. 

There  are  a  number  of  different  techniques,  some  stronger  and  some  weaker,  that  all  fall  within  the 
general  framework  of  cross-situational  learning.  The  similarities  and  differences  between  these  tech¬ 
niques,  as  well  as  the  power  of  the  general  approach,  are  best  illustrated  by  way  of  several  small  exam¬ 
ples.  This  chapter  presents  two  examples  of  cross-situational  learning.  They  are  designed  for  expository 
purposes,  to  characterize  in  a  simple  way  the  techniques  u,sed  by  more  complex  implementations.  Ac¬ 
cordingly  they  utilize  simple  linguistic  theories  and  make  use  of  some  prior  language-specific  knowledge 
in  the  form  of  a  fixed  context-free  grammar  for  the  language  being  learned.  In  chapter  4.  I  present 
three  implemented  systems  that  incorporate  more  substantive  linguistic  theories.  Some  of  these  systems 
require  less  prior  language-specific  knowledge  then  the  simple  pedagogical  examples  discussed  in  this 
chapter. 

Before  presenting  the  examples,  I  will  first  discuss  fracturing,  a  key  technique  used  in  both  the 
examples  and  the  implemented  systems  to  be  described.  Fracturing  is  a  way  of  running  the  linking  rules 
in  reverse.  Linking  rules  are  normally  conceived  of  as  a  means  for  combining  the  meanings  of  words  into 
the  meanings  of  utterances  comprising  those  words.  During  language  acquisition,  the  learner  is  faced 
with  the  opposite  task.  After  pairing  utterances  with  potential  meanings  derived  from  the  non-linguistic 
context  of  those  utterances,  the  learner  must  pull  apart  an  utterance  meaning  to  map  fragments  of  that 
meaning  to  individual  words  in  the  utterance.  The  next  section  will  present  a  technique  for  running  a 
particular  linking  rule  in  reverse,  namely  the  linking  rule  proposed  by  Jackendoff  (1983).  Sections  3.2 
and  3.3  will  then  present  two  fully  worked-out  examples  of  cross-situational  learning  in  action. 


3.1  Linking  and  Fracturing 

Throughout  much  of  part  1  of  this  thesis,  I  will  represent  meanings  as  terms,  i.e.  expressions  com¬ 
posed  of  primitive  constant  and  function  symbols.  For  expository  purposes,  I  will  use  primitives 
taken  primarily  from  Jackendoff's  (1983)  conceptual  structure  notation,  though  1  will  extend  this 
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set  arbitrarily  as  needed,'  Thus  typical  meaning  expressions  will  include  (;()(cup,  FH()M(Johu)) 
and  SEE( John.  Mary).  None  of  the  techniques  in  part  I  of  this  thesis  attribute  any  interpretation 
to  the  primitives.  In  every  way,  the  meaning  expression  ( JO(cup.  EROM( John))  is  treated  the  same 
as  f{a.g{b))- 

V'ariable-free  meaning  expressions  such  as  those  given  above  will  denote  the  meanings  of  whole  utter¬ 
ances.  The  meanings  of  utterance  fragments  in  general,  and  words  in  particular,  will  be  representetl  as 
meaning  expression  fragmetits  that  may  contain  variables  as  place  holders  for  utifilled  portions  of  that 
fragment.  Thus,  the  word  from  might  have  the  meaning  FROM(j')  while  the  word  John  might  have  the 
meaning  John.'^  Crucial  to  many  of  the  techniques  discussed  in  part  1  of  this  thesis  is  a  particular  link¬ 
ing  rulf  used  to  combine  the  meanings  of  words  to  form  the  meanings  of  phrases  and  whole  utterances. 
This  linking  rule  is  adopted  by  numerous  authors  including  Jackendoff  (19X3.  199U).  Pinker  (1989). 
and  Dorr  (1990a,  1990b).  Informally,  the  linking  rule  forms  the  meaning  of  the  prepositional  phrase 
from  John  by  combining  FROM(j  )  with  John  to  form  FROM(John). 

This  linking  rule  can  be  stated  more  formally  as  follows.  Each  node  in  a  parse  u  *  <  -  assigned  an 

expression  to  represent  its  meaning.  The  meaning  of  a  terminal  node  is  taken  from  tlj.  lexical  entry 
for  the  word  constituting  that  node.  The  meaning  of  a  non-terminal  node  is  derived  from  the  meanings 
of  its  children.  Every  non-terminal  node  u  has  exactly  one  distinguished  child  called  its  head.  The 
remaining  children  are  called  the  complements  of  the  head.  The  meaning  of  ii  is  formed  by  substituting 
the  meanings  of  each  of  the  complements  for  all  occurrences  of  some  variable  in  the  meaning  of  tlu' 
head.  To  avoid  the  possibility  of  variable  capture,  without  adding  the  complexity  of  a  variable  renaming 
process,  we  require  that  the  meaning  expressioti  fragments  as.sociated  with  complements  be  variable- 
free.  Notice  that  this  rule  does  not  stipulate  which  complements  substitute  for  which  variables.  Thus 
if  GO(x,TO(y))  is  the  meaning  of  the  head  of  some  phrase,  and  John  is  the  meaning  of  its  complement, 
the  linking  rule  can  produce  either  GO(j'.  TO(John))  or  GO(  John,TO((/))  as  the  meaning  of  the  phrase. 
The  only  restriction  on  linking  is  that  the  head  meaning  must  contain  at  least  as  many  distinct  variables 
as  there  are  complements. 

Some  authors  propose  variants  of  the  above  linking  rule  that  further  specifies  which  variables  are 
linked  with  which  argument  positions.  For  example.  Pinker  (1989)  stipulates  that  the  x  in  GO(x.g) 
is  always  linked  to  the  direct  internal  argument.  Irrespective  of  whether  this  is  true,  either  for  En¬ 
glish  specifically,  or  cross-linguistically  in  general,  I  refrain  from  adopting  such  restrictions  here  for 
two  reasons.  First,  the  algorithms  presented  in  part  1  of  this  thesis  apply  generally  to  any  expressions 
denoting  meaning.  They  transcend  a  particular  representation  such  as  Jackendovian  conceptual  struc¬ 
tures.  Linking  restrictions  such  as  those  adopted  by  Pinker  apply  only  to  expressions  constructed  out 
of  J ackendovian  primitives.  Since,  for  reasons  to  be  discussed  in  part  II  of  this  thesis,  the  .Jackendovian 
representation  is  inadequate,  it  does  not  make  sense  to  base  a  learning  theory  on  restrictions  which  are 
particular  to  that  representation.  Second,  the  learning  algorithms  presented  here  are  capable  of  learning 
without  making  such  restrictions.  In  fact,  such  restrictions  could  be  learned  if  they  were  indeed  true. 
The  standard  motivation  for  assuming  a  faculty  to  be  innate  is  the  poverty  of  stimulus  argument.  This 

'In  part  II  of  this  thesis.  I  will  discuss  the  inadequacies  of  both  Jackendovian  conceptual  structure  representations  as 
well  as  substitution-based  linking  rules.  Since  much  of  the  work  in  part  1  predates  the  work  described  in  part  II.  it  was 
formulated  using  the  Jackendovian  representation  and  associate  linking  rule  as  a  matter  of  expedience,  since  that  is  what 
wais  prominent  in  the  literature  at  the  time.  In  more  recent  work,  such  as  that  described  in  section  4.3.  I  abandon  the 
Jackendovian  representation  in  favor  of  simple  thematic  role  assignments,  a  much  weaker  form  of  semantic  representation. 
This  also  enteuls  abandoning  substitution-based  linking  in  favor  of  ^-marking.  In  the  future  1  hope  to  incorporate  the  more 
comprehensive  semantic  representations  discussed  in  part  II  of  this  thesis  into  the  techniques  described  in  part  1. 

^This  mitigates  to  some  extent,  the  inadequacies  of  the  Jackendovian  representation.  Nothing  in  part  I  of  this  thesis 
relies  on  the  semantic  content  of  a  particular  set  of  primitives.  The  techniques  described  here  apply  equally  well  to  any 
representation  provided  that  the  representation  adheres  to  a  substitution-based  linking  rule.  The  inadequacies  of  such  a 
linking  rule  still  limit  the  applicability  of  these  techniques,  however. 

^Throughout  this  chapter  and  the  next  I  will  use  the  somewhat  pretentious  phrase  ‘the  meaning  of  j~'  to  mean  ‘the 
meaning  expression  associated  with  t'. 
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•John  .slid  iht  cup  from  Mary  to  Bill. 

CAT S E(  John.  ( J 0( cup ,  [pat h  F ROM ( Mary ) ,  TO(  Mary )] ) ) 


John  slid  the  cup  from  Mary  to  Bill 

John  CAUSE(r,GO(cup.  [pa,h  FROM(Mary).TO(Mary)])) 


CAUSE(a“,GO(j/,  [path  «,  (']))  cup 


the 

L 


cup 

cup 


FROM  (Mary) 


from  Mary 
FROM(j[“)  Mary 


TO(Bill) 


to  Bill 
TO(j)  Bill 


Figure  3.1:  A  derivation  of  the  meaning  of  the  utterance  John  slid  the  cup  from  Mary  to  Bill  from 
the  meanings  of  its  constituent  words  using  the  linking  rule  proposed  by  Jackendoff. 


argument  is  falsified  most  strongly  with  a  demonstration  that  something  is  learnable.  It  is  important 
not  to  get  carried  away  with  our  rationalist  tendencies  making  unwarranted  innateness  assumptions  in 
light  of  the  rare  observation  of  something  that  is  indeed  learnable  by  empiricist  methods. 

Some  words,  such  as  determiners  and  auxiliaries,  appear  not  to  have  a  meaning  that  can  be  easily- 
characterized  as  meaning  expressions  to  be  combined  by  the  above  linking  rule.  To  provide  an  escape 
hatch  for  semantic  notions  that  fall  outside  the  system  described  above,  we  provide  the  distinguished 
meaning  symbol  ±.  Typically,  words  such  as  the  will  bear  ±  as  their  meaning.  The  linking  rule  is 
extended  so  that  any  complements  that  have  X  as  their  meaning  are  not  substituted  into  the  meaning  of 
the  head.  This  allows  forming  cup  eis  the  meaning  of  the  cup  when  the  has  X  and  cup  has  cup  as  their 
respective  meanings.  Using  this  linking  rule,  the  meaning  of  phrases,  and  ultimately  entire  utterances 
can  be  derived  from  the  meanings  of  their  constituent  words,  given  a  parse  tree  annotated  as  to  which 
children  are  heads  and  which  are  complements.  A  sample  derivation  is  shown  in  figure  ,3.1.  Note  that 
the  linking  rule  is  ambiguous  and  can  produce  multiple  meanings,  even  in  the  absence  of  lexical  and 
structural  ambiguity,  since  it  does  not  specify  which  variables  are  linked  to  which  complements.  Also 
note  that  the  aforementioned  linking  rule  addresses  only  issues  of  argument  structure.  No  attempt  is 
made  to  support  other  aspects  of  compositional  semantics  such  as  quantification. 

Substitution-based  linking  rules  are  not  new.  They  are  widely  discussed  in  the  literature  (cf.  Jack¬ 
endoff  1983,  1990,  Pinker  1989,  and  Dorr  1990a,  1990b).  The  techniques  in  this  thesis  explore  a  novel 
application  of  such  linking  rules:  the  ability  to  use  them  in  reverse.  Traditionally,  compositional  seman¬ 
tics  is  viewed  as  a  process  for  deriving  utterance  meanings  from  word  meanings.  This  thesis  will  explore 
the  opposite  possibility:  deriving  the  meanings  of  individual  words  from  the  meanings  of  utterances 
containing  those  words.  I  will  refer  to  this  inverse  linking  process  as  fracturing. 

Fracturing  is  best  described  by  way  of  an  example.  Assume  some  node  in  some  parse  tree  has  the 
meaning  GO(cup.TO(John)).  Furthermore,  assume  that  the  node  has  two  children.  In  this  case  there 
are  four  possibilities  for  assigning  meanings  to  the  children. 
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Head 

Complement 

GO(i,TO(John)) 
GO(cup,  J-) 
GO(cup,TO(j-)) 
GO(cup,  TO(Johii)) 

cup 

TO(John) 

Johu 

X 

Note  specifically  the  last  possibility  of  assigning  ±  as  the  meaning  of  the  complement.  This  option  will 
always  be  present  when  fracturing  any  node.  The  above  fracturing  process  can  be  applied  recursively, 
starting  at  the  root  node  of  a  tree,  proceeding  toward  its  leaves,  to  derive  pos,sible  word  meanings  from 
the  meaning  of  a  whole  utterance.  More  formally,  fracturing  a  node  u  is  accomplished  by  the  following 
algorithm. 

Algorithm  To  fracture  the  meaning  expression  associated  with  a  node  ii  into  meaning  expression 
fragments  associated  with  the  head  of  u  and  its  complements: 

Let  f  be  the  meaning  of  u.  For  each  complement,  either  assign  X  as  the  meaning  of  that 
complement  or  perform  the  following  two  steps. 

1.  Select  some  subexpression  s  of  e  and  assign  it  as  the  meaning  of  that  complement.  The 
subexpression  s  must  not  contain  any  variables  introduced  in  step  2. 

2.  Replace  one  or  more  occurrences  of  s  in  e  with  a  new  variable. 

After  all  complements  have  been  as.signed  meanings,  assign  c  as  the  meaning  of  the  head.  □ 

As  stated  above,  the  fracturing  process  is  mediated  by  a  parse  tree  annotated  w  ith  head-child  mark¬ 
ings,  Given  a  meaning  expression  e ,  one  can  enumerate  all  meaning  expression  fragments  which  can  possi¬ 
bly  link  together  to  form  e,  irrespective  of  any  parse  tree  for  deriving  f ,  Such  a  meaning  fragment  is  called 
a  submeaning  of  e.  For  example,  the  following  are  all  of  the  submeanings  of  GO(cup,  FROM(John)), 

GO(cup,  FROM(John)) 
cup 

GO(a-.FROM(John)) 

GO(ar,FROM(i/)) 

GO{x,y) 

GO(cup,  FROMl/")) 

GO(cup,  f) 

FROM(John) 

FROM(f) 

John 

_ X _ 

If  an  utterance  has  e  cis  its  meaning,  then  every  word  in  that  utterance  must  have  a  submeaning  of  t 
as  its  meaning.  The  set  of  submeanings  for  a  meaning  expression  c  can  be  derived  by  the  following 
algorithm. 

Algorithm  To  enumerate  all  submeanings  of  a  meaning  expression  e: 

Let  s  be  some  subexpression  of  e.  Repeat  the  following  two  steps  an  arbitrary  number  of 
times. 

1.  Select  some  subexpression  t  of  s  not  containing  any  variables  introduced  in  step  2. 

2.  Replace  one  or  more  occurrences  of  t  in  s  with  a  new  variable. 

Upon  completion,  s  is  a  possible  submeaning  of  c .  Furthermore,  X  is  a  possible  submeaning 
of  every  expression,  □ 

Both  the  fracturing  algorithm,  as  well  as  the  algorithm  for  enumerating  all  submeanings  of  a  given 
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meaning  expression,  will  play  a  piom:t><^nt  role  tlirougliout  the  remaiiuler  of  part  1  of  this  thesis. 


3.2  Learning  Syntactic  Categories 

Consider  the  following  problem.  Suppose  that  a  learner  was  given  a  fixed  context-free  grammar  along 
with  a  corpus  of  utterances  generated  by  that  grammar."*  (liven  such  information,  the  learner  must  derive 
a  lexicon  mapping  the  words  in  the  corpus  to  their  syntactic  category.  .\o  non-linguist ic  information  is 
given  to  the  learner. 

This  problem  is  typified  by  the  following  example.  Suppose  that  the  learner  is  given  the  following 
context-free  grammar. 

S  —  NPVF 
NP  -  {D} N 
VP  —  V{NP{NP}} 

1  will  refer  to  this  grammar  as  oi.  Now  suppose  that  the  learner  hears  the  utterance  John  .saw  Mary. 
Since  G\  generates  only  two  three- w-ord  terminal  strings,  namely  N  \’  N  and  D  .\  V.  the  learner  can 
conclude  that  John  must  be  either  a  noun  or  a  determiner,  saw  a  verb  or  a  noun,  and  Mary  either  a  noun 
or  a  verb,  given  their  respective  positions  in  the  input  string.  If  the  learner  later  hears  the  utterance 
Mary  ate  breakfast,  she  can  perform  a  similar  analysis  and  conclude  that  Mary  must  be  a  noun  since 
only  nouns  can  appear  as  both  the  first  and  third  words  of  a  three  word  utterance. 

This  analysis  is  based  on  one  crucial  assumption:  (hat  each  word  bear  only  one  syntactic  category.  1 
will  call  this  assumption  the  monosemy  constraint.  Clearly  language  contains  polysemous  words.  1  will 
discuss  potential  ways  of  relaxing  the  monosemy  constraint  in  section  ~>.'2. 

I  w’ill  refer  to  the  above  technique  as  weak  cross-situational  learning.  In  the  above  example,  weak 
cross-situational  learning  constrains  only  the  syntactic  category  of  Mary,  and  not  any  of  the  remaining 
words,  since  only  Mary  appears  in  multiple  utterances.  The  learner  can  nonetheless  |)erform  more 
aggressive  inference  given  the  above  information.  Once  the  learner  infers  that  Mary  is  a  noun,  she  can 
rule  out  D  N  V  as  a  possible  analysis  for  Mary  ate  breakfast,  leaving  only  the  N  V  N  analysis.  Thus  the 
learner  can  also  infer  that  ate  is  a  verb  and  breakfast  is  a  noun.  Furthermore,  if  the  learner  was  able  to 
reanalyze  previous  utterances,  she  could  perform  a  similar  analysis  on  John  saw  Mary  and  determine 
that  John  is  a  noun  and  saw  is  a  verb.  The  given  grammar  and  corpus  permit  only  one  consistent 
analysis  and  thus  entail  a  unique  lexicon.  1  will  call  the  process  of  finding  such  a  consistent  analysis, 
strong  cross-situational  learning.  In  the  above  example,  weak  cross-situational  learning  could  never 
converge  to  a  unique  lexicon  since  a  noun  can  appear  anywhere  a  determiner  can  appear.  Thus  strong 
cross-situational  learning  is  strictly  more  powerful  than  weak  cross-situational  learning. 

As  formulated  above,  cross-situational  learning  requires  the  learner  to  remember  prior  utterances. 
This  may  not  be  cognitively  plausible.  An  alternative  formulation,  however,  alleviates  this  drawback.  A 
lexical  entry  can  be  viewed  as  a  proposition,  for  example 

category(./oA»»)  =  N, 

A  lexicon  is  normally  thought  of  as  a  set  of  lexical  entries.  This  can  be  viewed  as  a  conjunction  of 
propositions,  for  example 

category!  ./oAn)  =  ^  A  category!  saw)  =  V  A  category!  3/a r^)  =  N 

'‘C:iearly  children  do  not  have  prior  access  to  such  language-specific  infomiation.  This  is  example  i.s  simplified  for 
expository  purposes.  The  Davra  and  Kenuniasystems  discussed  in  chapter  4  do  not  assume  prior  access  to  a  language- 
specific  grekinmeu-. 
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The  concept  of  a  lexicon  forrnnla  can  he  extended  to  include  disjunctions  ol  propositions.  Sncli  disjunctive 
lexicon  fonnnlat' can  represent  intermediate  states  of  partial  information  ahont  the  lexicon  heinj;  learned. 
Thus  after  hearing  the  utterance  .Ji>hn  sair  Mari/,  the  learner  can  form  the  following  disjunctive  Icoxicon 
formula. 

(category!  .yn/i»)  =  N  A  category(.s(/(/j  \'  A  category!  .Uh/  i/)  =  N)V 
(category!  yo/iii)  =  D  A  category!  s««j  =  \  A  category!  .Urti  i/)  =  \  ) 

The  learner  can  discard  the  utterance  and  retain  only  the  deriveil  lexicon  formula.  I'pon  hearing  each 
new  utterance,  the  learner  can  form  a  new  lexicon  formula  for  that  utterance  aiul  conjoin  it  with  the 
previous  lexicon  formula.  In  this  case,  the  entire  lexicon  formula  would  he  a  conjunction  of  disjunctions 
of  conjunctions  of  lexical  entry  jiropositiotis.  Further  formulae  repres-nting  the  monosemy  constraint 
can  he  conjoined  with  the  lexicon  formula.  Such  /noriasemy  formulae  take  the  following  form 

category!  srtic)  =  A  calegory!.s«»')  =  \' 

which  states  that  no  word  can  hear  two  different  categories.  Strong  cross-situational  leartiing  can  then 
be  seen  as  finding  truth  assignments  to  the  lexical  entry  propositions  which  satisfy  the  resulting  lexicon 
formula.  Though  determining  propositional  satisfiability  is  NP-complete,  well-known  heuristics,  such  as 
boolean  constraint  propagation,  can  usually  solvesuch  problems  efficient  ly  in  practice  (cf.  Mc.-Vllestt'r  un¬ 
published,  1978,  1980,  1982,  and  Zabih  and  Mc.-Mlester  1988). 

The  difference  between  weak  and  strotig  cross-situational  learning  can  be  seen  as  generating  different 
forms  of  lexicon  formulae,  (.liven  the  utterance  Jo/m  saw  Mari),  weak  cross-situational  learning  can  be 
viewed  as  constructing  the  following  lexicon  formula 

(category!  yo/i/i)  =  N  V  category!  ye/i»)  =  D)A 
!category!.s(?(/')  =  \'  V  category! ••><?«)  =  N)A 
(category! .l/nrjd  =  \  V  category! (try)  =  \’) 

instead  of  the  formula  described  previously.  It  is  easy  to  see  that  the  lexicon  formula  created  for  weak 
cross-situational  learning  is  linear  in  the  size  of  the  input  utterance.  The  naive  apjiroach  for  generating 
the  lexicon  formula  corresponding  to  strong  cross-situational  learning  would  generate  a  disjvinct  for  each 
possible  parse.  Since  there  could  be  an  exponential  number  of  parses,  this  would  appear  intractable. 

It  is  possible  however,  to  use  a  variant  of  the  C'K^'  algorithm  (Kasanh  n)!)').  ^■ounger  1967)  to  share 
common  subformulae  and  generate,  in  polynomial  time,  a  lexicon  formula  whose  size  is  polynomial  in 
the  length  of  the  input  utterance.  This  is  done  as  follows.  Lexical  entry  propositions  of  the  form  ^ 
are  created  for  each  word  w  and  syntactic  category  c.  Next,  for  each  utterance,  propositions  of  tl.e 
form  pijr  are  created  for  each  syntactic  category  c  and  each  0  <  ;  <  J  <  n  where  n  is  the  length  of  the 
utterance.  Intuitively,  the  proposition  ]>ij^  is  true  if  the  subphra.se  from  position  i  through  position  j  in 
the  utterance  can  be  parsed  as  category  c.  For  each  binary  branching  rule  A  —  B  (  ’  in  the  grammar.  ’ 
and  for  each  0  <  /  <  j  <  n,  propositional  formulae  of  the  form 

J 

—  V  Pu  b  a  Pkjc 

A'  =  i 

are  conjoined  to  form  a  large  formula.  To  this  one  conjoins  all  formulae  of  the  form 

Pac 

where  C  is  a  category.  0  <  i  <  u.  and  ir  is  thf  word  at  position  as  well  as  asserting  the  single 
proposition  pnnS  where  S  is  the  root  category  of  the  grammar.  Formulae  such  as  the.se  are  creat('d 

''Any  context-fref*  grammar  can  be  converted  into  a  weakly  equivalent  grammar  containing  only  binary  branching  niles. 
This  cottV'^rsion  pn^cess.  known  as  conversion  to  C’homsky  N\>rmal  Form,  does  not  aff<  r!  tlie  category  learning  process. 
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for  facli  uttt'raiict'  in  tlic  corpus  ainl  conjoined  logeilier.  Kiiially.  inonosem>  roniinla<'  over  llii-  /„  , 
propositions  are  addeil  to  eiif'orct'  the  nioiu>semy  constraint,  lliis  whole  lorniula  can  he  converteil 
to  conjunctive  normal  lorm  yiehling  a  formula  whose  size  is  polynomial  in  the  length  of  the  corpus.' 
Satisfying  assignments  to  this  formula  constitute  vvord-to-category  mapiiings  that  are  consistent  with 
both  the  corpus  and  the  grammar. 


3.3  Learning  Syntactic  Categories  and  Word  Meanings  To¬ 
gether 

The  previous  example  illustrated  the  use  of  weak  anil  strong  cross-situational  techniques  for  learning 
syntactic  categories  from  linguistic  information  alone  without  any  reference  to  semantics.  It  is  jiossihle 
to  extend  these  techniques  to  learn  both  syntactic  and  semantic  information  when  gi\en  both  linguistic 
and  non-linguistic  input.  As  the  next  example  will  illustrate,  non-linguist ic  injiut  ran  help  not  only  in  the 
acquisition  of  word  meanings  but  can  al.so  assist  in  learning  syntactic  categories  as  well.  Furthermore, 
syntactic  knowledge  can  aid  the  acquisition  of  word  meanings.  The  example  will  demonstrate  how  strong 
cross-situational  learning,  applied  to  a  combined  syntactic  and  semantic  theory,  is  more  powi'rful  than 
either  weak  or  strong  cross-situational  learning  applied  to  either  syntax  or  semantics  alone. 

Consider  a  learner  who  possess  the  following  context-free  grammar. 

S  -  NP  VP 
NP  -  {D} N 
VP  -  V{NP{NP}}PP* 

PP  —  P  NP 


1  will  refer  to  this  grammar  as  (/•.>.  Now  suppose  that  the  learner  hears  the  following  fivi'  utterances.''^ 


•‘-’i 

So 

■‘’'3 

*4 

*'5 


./oAh  fled  from  the  dog. 

■John  walked  from  a  comer. 
Mary  walked  to  the  corner. 
Alary  ran  to  a  cat. 

.John  slid  from  Bill  to  Mary. 


FLEE(Jolm.FROM(dog)) 

WALKfJolm.  FROM(coriior)) 

WA  L  K  ( Mary .  TO(  corner ) ) 
RUN(Mary.TO(cat)) 

SLIDEfJohn,  FROM(Bill),TO(Mary)]) 


Each  utterance  is  paired  with  its  correct  meaning  as  derived  by  the  learner  from  observation  of 
its  non-linguistic  context,"  Furthermore,  I  will  assume  that  the  learner  knows  that  each  of  the  input 
utterances  is  generated  by  Co  and  that  the  meanings  associated  with  each  utterance  an'  derived  from  the 
meanings  of  the  words  in  that  utterance  via  the  syntax-mediated  linking  rule  described  in  section  d.l. 
In  this  example  however,  I  assume  that  the  learner  does  not  know  which  syntactic  categories  constitute 
the  heads  of  the  rules  in  f/o.  Thus  the  learner  must  consider  all  possibilities.  The  task  faced  by  the 

*'The  size  of  the  formula  constructed  for  each  utterance  is  cubic  in  the  length  of  that  utterance.  .Assuming  a  Ixuinit 
on  utterance  length,  the  size  of  the  formula  constructed  is  thus  linear  in  the  numtier  of  utterances  and  quadratii'  in  the 
nuniljer  of  distinct  words  appearing  in  the  corpus,  due  to  the  luonoseniy  formulae. 

'To  reiterate,  words  in  italir.^  denote  linguistic  tokens  while  words  in  boldface  or  ItPPliK  ('.ASP  denote  seiuanti<' 
representations  of  word  meanings.  There  is  no  prior  correspondence  between  a  linguistic  token  such  as  .lohn  anil  a 
semantii'  token  such  as  John,  even  though  they  share  the  same  spelling.  They  are  treated  as  uninterpreted  tokens.  I  he 
task  faced  bv  the  learner  is  to  acquire  the  appropriate  correspondences  as  word-to-meaning  ma])i>ings. 

''for  the  purposes  of  this  thesis,  the  notation  [path  ‘  be  viewed  as  a  two  argument  function  which  combines  two 
paths  to  vield  an  aggregate  path  with  the  combined  properties  of  the  path  arguments  x  and  (/• 

^  To  simplify  this  example  1  will  assume  that  the  learner  unambiguously  knows  the  meaning  of  each  utterance  in  the 
corpus.  Techniques  describeil  section  '2.1  can  be  useil  to  relax  this  assumption  and  allow  referential  uinertainty.  Such 
techniques  are  incorporated  in  all  of  the  implementations  riescribed  in  chapter  A. 
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Figure  3.2:  All  possible  terminal  category  strings  for  five  and  six  word  utterances  generated  by 
grammar  G3. 

learner  is  to  discern  a  lexicon  that  maps  words  both  to  their  syntactic  categories,  as  well  their  meanings, 
so  that  the  derived  lexicon  consistently  allows  the  utterances  to  be  generated  by  G'2  and  their  associated 
meanings  to  be  derived  by  the  linking  rule. 

Consider  first  what  the  learner  can  glean  by  applying  weak  cros.s-situational  techniques  to  the  lin¬ 
guistic  information  alone.  Each  of  the  input  utterances  is  five  words  long,  except  for  the  last  utterance 
which  is  six  words  long.  There  are  seven  possible  terminal  strings  of  length  five,  and  nine  of  length  six. 
These  are  illustrated  in  figure  3.2. 

The  syntactic  category  eissignments  produced  by  weak  cross-situational  learning  are  illustrated  in 
figure  3.3.  Note  that  weak  cross-situational  learning  can  uniquely  determine  only  the  syntactic  categories 
of  Mary,  comer,  cat,  and  dog.  These  are  uniquely  determined  because  they  occur  in  utterance  final 
positions  and  Go  allows  only  nouns  to  appear  as  the  last  word  of  utterances  of  length  greater  than  three. 
Furthermore,  notice  that  in  the  above  corpus,  most  of  the  words  appear  cross-situationally  in  the  same 
position  of  an  utterance  of  the  same  length.  Thus  the  set  intersection  techniques  of  weak  cross-situational 
learning  offer  little  help  here  in  reducing  the  possible  category  mappings.  In  fact,  only  the  words  Mary 
and  to  engender  the  intersection  of  two  distinct  category  sets.  Even  here  though,  one  set  is  a  subset  of 
the  other.  Thus  for  this  example,  weak  cross-situational  learning  provides  no  information. 

Strong  cross-situational  learning  can  improve  upon  this  somewhat  but  not  significantly.  The  fact 
that  Mary  is  a  noun  rules  out  the  first  three  analyses  for  both  S3  and  S4  since  they  require  the  first 
word  to  be  a  determiner.  This  implies  that  both  walked  and  ran  must  be  verbs  since  the  remaining  four 
analyses  all  have  verbs  in  second  position.  Discovering  that  walked  is  a  verb  can  allow  the  learner  to  rule 
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John’. 

[N.D] 

{N.D}n{N.D)n{N.D} 

fled. 

[N.V] 

{N.V} 

from: 

[N.V.P.D] 

{N.  V,  P,  D}  n  {N.  V,  P,  D}  n  {N.  V.  P,  D} 

ihe: 

[N.P.D] 

{N.P.Djn  {N.P.D} 

dog: 

[N] 

{N} 

walked: 

[N.V] 

{N.V}  n  {N.V} 

a: 

[N.P.D] 

{N,P,D}n  {N.P.D} 

cornfr. 

[N] 

{N} 

Mary: 

[N] 

{N.D}n{N,D}n{N} 

fo: 

[N.P.D] 

{N,V.P.D}n  {N.V.P.D}  n  {N.P.D} 

ran: 

[N.V] 

{N.V} 

cat: 

[N] 

{N} 

slid: 

[N.V] 

{N.V} 

Bill: 

[N.P.D] 

{N.P.D} 

Figure  3.3:  An  illustration  of  the  syntactic  category  assignments  that  weak  cros.s-situational  learning 
can  infer  for  the  sample  corpus  using  linguistic  information  alone. 


out  the  first  three  analyses  for  so  since  they  require  a  noun  in  second  position.  This  allows  the  learner 
to  infer  that  John  must  be  a  noun  and  from  cannot  be  a  verb.  Since  John  is  a  noun,  si  cannot  have  the 
first  three  analyses  and  ss  cannot  have  the  first  four.  Thus  fled  and  slid  must  be  verbs  and  Bill  cannot 
be  a  determiner. 

At  this  point  the  learner  knows  the  syntactic  categories  of  all  of  the  words  in  the  corpus  except 
for  from,  io,  the,  a,  and  Bill.  The  words  from.  to.  the,  and  a  might  still  be  either  nouns,  preposi¬ 
tions,  or  determiners,  and  Bill  might  be  either  a  noun  or  a  preposition.  There  are  however,  additional 
cross-situational  constraints  between  the  possible  category  assignments  of  these  w’ords.  Not  all  possible 
combinations  are  consistent  with  Go.  One  can  construct  a  constraint  satisfaction  problem  (CSP)  whose 
solutions  correspond  to  the  allowable  combinations.  The  variables  of  this  CSP  are  the  words  from.  to. 
the,  and  a.  Each  of  these  variables  range  over  the  categories  N,  D,  and  P.  Define  P{T.y)  to  be  the  con¬ 
straint  which  is  true  if  one  of  the  last  four  analyses  for  five  word  utterances  allows  category  i  to  appear 
in  third  position  at  the  same  time  that  category  y  can  appear  in  fourth  position.  Thus  P(x.y)  is  true 
only  for  the  pairs  (D, N),  (N,D),  (N,P),  and  (P. D).  Furthermore,  define  Q(x,y)  to  be  the  constraint 
which  is  true  if  one  of  the  l^lst  five  analyses  for  six  word  utterances  allows  category  x  to  appear  in  third 
position  at  the  same  time  that  category  y  can  appear  in  fifth  position.  Thus  Q{x.y)  is  true  only  for 
the  pairs  (D,  D),  (D,  P),  (N,D),  (N.P),  and  (P,  P).  The  allowed  category  mappings  must  satisfy  the 
following  constraint. 


P{from,  a)  A  P(from,  the)  A  P(to.  a)  A  P(  to.  the)  A  Q(from.  to) 

This  constraint  admits  only  three  solutions.  The  following  table  outlines  these  possible  simultaneous 
category  mappings  along  with  the  analyses  they  entail  for  each  of  the  five  utterances  in  the  corpus. 


from 

to 

the 

a 

*1 

S2 

«3 

S5 

N 

p 

D 

D 

(e) 

(e) 

(g) 

(g) 

(h) 

P 

p 

D 

D 

(g) 

(g) 

(g) 

(g) 

(i) 

D 

D 

N 

N 

(d) 

(d) 

(d) 

(d) 

(e) 

Thus  the  and  a  cannot  be  prepositions  and  to  cannot  be  a  noun.  Furthermore,  this  analysis  has  shown 
that  Bill  must  be  a  noun.  In  this  example,  strong  cross-situational  learning  cannot,  however,  narrow 
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from  can 

be  a 

N 

John 

fled 

from 

1h( 

dog. 

N 

V 

N 

D 

N 

John 

walked 

from 

a 

conifv. 

N 

V 

N 

D 

N 

.Mary 

walked 

to 

the 

comtr. 

N 

V 

p 

D 

N 

Mary 

ran 

to 

a 

cat. 

N 

V 

p 

D 

N 

John 

slid 

from 

Bill 

to 

Mary. 

N 

V 

N 

N 

P 

N 

from  can  be  a  D 
to  can  be  a  D 
the  can  be  a  N 
a  can  be  a  N 

John 

fled 

from 

the 

dog. 

N 

V 

D 

N 

N 

John 

walked 

from 

a 

comer. 

N 

V 

D 

N 

N 

Mary 

walked 

to 

the 

comer. 

N 

V 

D 

N 

N 

M  ary 

ran 

to 

a 

cat. 

N 

V 

D 

N 

N 

John 

slid 

from 

Bill 

to 

Mary. 

N 

V 

D 

N 

D 

N 

Figure  3.4:  Analyses  of  the  corpus  which  are  consistent  with  the  language  model  after  strong  cross- 
situational  techniques  have  been  applied  to  synta.x,  but  which  are  nonetheless  incorrect. 


down  the  possible  syntactic  categories  for  from,  to,  the,  and  a  any  further.  Figure  3.4  shows  consistent 
analyses  where  the  and  a  can  be  a  noun,  to  can  be  a  determiner,  and  from  can  be  either  a  noun  or  a 
determiner, 

Cross-situational  learning  can  be  applied  to  semantics  much  in  the  same  way  as  syntax.  Using  the 
fracturing  technique  described  in  section  3.1,  it  is  possible  to  enumerate  all  of  the  submeanings  of  the 
meaning  expressions  associated  with  each  utterance  in  the  corpus.  These  are  illustrated  in  figure  3. .5 

Applying  weak  cross-situational  learning  techniques,  the  learner  can  constrain  the  possible  meanings 
of  Mary  to  the  intersection  of  the  sets  of  submeanings  for  each  of  the  utterances  S3,  s^,  and  S5,  since 
Mary  appears  in  each  of  these  three  utterances.  Thus  Mary  must  take  on  one  of  the  meanings  X, 
Mary,  or  TO{j:)  to  be  consistent  with  these  utterances.  A  similar  analysis  can  narrow  the  possible 
meanings  of  the  words  a,  the,  John,  walked,  from,  to,  and  comer  since  each  of  these  words  appears  in 
more  than  one  utterance.  Figure  3.6  gives  the  restricted  sets  of  possible  meanings  derived  for  these  seven 
words.  Weak  cross-situational  learning  cannot  constrain  the  meaning  of  the  remaining  words  since  they 
each  appear  only  in  a  single  utterance  in  the  corpus.  Note  that  for  this  example,  weak  cross-situational 
learning  applied  to  semantics  has  succeeded  in  uniquely  determining  the  meaning  of  only  two  words. 
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A:i 


John  fled  from  the  dog. 

John  walked  from  a  corner. 

Mary  walked  to  the  corner. 

FLEE(John.  FROM(dog)) 

WALK(John.  FROM(coruer)) 

\VALK(Mary,  TO(corner)) 

Johu 

John 

Mary 

FLEE(j-,  FROM(dog)) 

WALK(j-,  FROM(comer)) 

V\'ALK(i-,TO(corner)) 

FLEE(/%FROM(i/)) 

VVALK(/-,FROM(*/)) 

\VALK(r-,TO((/)) 

FLEE(a-,j/) 

\VALK(j',  (/) 

\VALK(j-.  y) 

FLEE(Johii.FROM(a)) 

\VALK(John.FROM(j-)) 

WALK(Mary,TO(j')) 

FLEE(John,  f) 

VVALK(John,f) 

\VALK(Mary.  j") 

FROM(dog) 

FROM(  corner) 

TO(corner) 

FROM(x) 

FROM(j-) 

TO(i-) 

dog 

corner 

corner 

1 

1 

1 

Alary  ran  to  a  cat. 

John  slid  from  Bill  to  Mary. 

RUN  ( Mary,  TO(  cat)) 

SLIDE(John,  [p^th  FROM(Bill).TO(Mary)J) 

[path  FROM(Bm).TO(x)] 

Mary 

SLIDE(x,  [path  FROM(Bai).TO(Mary)]) 

[path  x.TO(Mary)] 

RUN(r,TO(cat)) 

SLIDE(John,  [path  FROM(a  ),TO(Mary)]) 

[path  FROM(Bm).x] 

RUN(r,TO(y)) 

SLIDE(John,  [path  FROM(Bill).TO(x)]) 

[path  FROM(x).TO(y)] 

RUN(j',y) 

SLIDE(John,  [path  x,TO(Mary)]) 

[path  x.TO(y)] 

RUN(Mary,TO(x)) 

SLIDE(John,  [path  FROM(Bill).a]) 

[pathFROM(x),y] 

RUN(Mary,  j) 

SLIDE(x,  [path  FROM(y).TO(Mary)]) 

[path  -r-y] 

TO(cat) 

SLIDE(x,[path  FROM(Bm),TO(.(/)]) 

FROM(Bill) 

TO(x) 

SLIDE(x,[path  y.TO(Mary)]) 

FROM(x) 

cat 

SLIDE(x,[path  FROM(Bm),y]) 

Bill 

i. 

SLlDE(John,  [path  FROM(x),TO(j/)]) 

TO(Mary) 

SLIDE(John,[path  J-.TO(y)]) 

TO(x) 

SLIDE( John,  [path  FROM(x),y]) 

Mary 

SLIDE(John,  [path  -c-y]) 

John 

SLIDE(x,[path  FROM(y).TO(c)]) 

SLIDE(x.y) 

SLIDE(x,[path  y.TO(c)]) 

SLIDE(x,[path  FROM(y),-']) 

SLIDE(x,[path  y.--]) 

[path  FROM(Bai),TO(Mary)] 

[path  FROM(x),TO(Mary)] 

SLIDE(John,  x) 

Figure  3.5:  An  enumeration  of  all  possible  submeanings  of  the  meaning  expressions  associated  with 
each  utterance  in  the  sample  corpus.  The  meaning  of  a  word  must  be  one  of  the  submeanings  of 
each  meaning  expression  associated  with  an  utterance  containing  that  word. 
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a 

= 

1 

n  ^4 

the 

= 

1 

Si  n  S3 

John 

€ 

{l,Jolm,FROM(x)} 

j  0  S'»  Pi  6*5 

Mary 

G 

{±.  Mary.TO(x)} 

S3  n  S4  n  S5 

walked 

G 

{X.  VVALK(x.  y),  corner} 

S'j  Pi  A3 

from 

G 

{X.  John,  FROM(x)} 

Ai  n  A'j  Pi  A5 

to 

G 

{X,  Mary,TO(x)} 

S3  n  S4  n  S5 

comer 

G 

{X,  WALK(x.y), corner} 

A  2  Pi  A3 

Figure  3.6:  Weak  cross-situational  techniques  can  form  these  narrowed  sets  of  possible  meanings  for 
the  words  which  appear  in  more  than  one  utterance  in  the  sample  corpus. 


namely  that  a  and  Hie  both  mean  .L. 

Neither  strong  cross-situational  learning  applied  to  syntax  alone,  nor  weak  cross-situational  learning 
applied  to  semantics  alone,  are  sufficient  to  uniquely  determine  the  syntactic  categories  or  meanings 
of  all  of  the  words  in  this  example.  It  is  possible  however,  to  apply  strong  cross-situational  learning 
techniques  to  this  problem,  incorporating  both  syntactic  and  semantic  constraints.  This  will  force  a 
unique  determination  of  the  lexicon.  To  see  this,  first  remember  that  strong  cross-situational  syntax 
learning  has  determined  that  S3  must  have  either  analysis  (d)  or  analysis  (g).  If  .S3  took  on  analysis  (d) 
then  it  would  have  the  following  structure. 


VVALK(  Mary ,  T0(  corner)) 


Mary  WALK(jr,TO(corner)) 
Mary 


walked  TO(r)  comer 


to  the 
TO(a-)  1 


We  know  that  the  root  node  must  mean  WALK(Mary,TO(corner))  since  that  is  given  by  observation. 
Furthermore,  we  know  that  the  must  mean  ±.  Since  the  root  meaning  contains  the  symbol  TO,  which 
cannot  be  contributed  by  the  possible  meanings  for  walk  and  comer,  either  the  word  Mary  or  the  w'ord 
to  must  take  on  TO(a’)  as  its  meaning.  Analysis  (d)  will  not  allow  Mary  to  mean  TO(r)  since  the 
linking  rule  could  not  then  produce  the  desired  root  meaning.  Thus  to  must  mean  TO(x).  Furthermore. 
Mary  must  mean  Mary  since  the  root  meaning  contains  the  symbol  Mary  which  no  other  word  can 
contribute.  At  this  point,  since  the  meanings  of  both  to  and  the  have  been  determined,  the  linking  rule 
then  fixes  the  meaning  of  the  phrase  to  the  to  be  TO(x).  The  linking  rule  can  also  operate  in  reverse, 
using  the  known  meanings  of  both  Mary  and  the  root  utterance  to  determine  that  the  phrase  walked  to 
the  comer  must  mean  WALK(x.TO(corner)).  At  this  point  however,  the  learner  can  determine  that 
the  linking  rule  has  no  way  of  forming  the  meaning  of  walked  to  the  corner  out  of  the  known  meaning  for 
to  the  and  the  potential  meanings  for  walked  and  comer.  Thus  the  learner  can  infer  that  utterance  S3 
cannot  have  analysis  (d),  and  must  therefore  have  analysis  (g). 

Analysis  (g)  has  the  following  structure. 
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VVA  L  K  ( Mary ,  T0(  coruer ) ) 


Mary 

Mary 


V\'A  LK  ( ,  TO(  coruer ) ) 


walkf  d 
WALK(r,j/) 


TO  (corner) 


to 

TO(r) 


corner 


Iht  corntr 
X  corner 


The  learner  can  annotate  this  structure  with  the  known  meaning  for  the  as  well  as  the  root  meaning.  As 
before,  either  the  word  Mary  or  the  word  to  must  mean  TO(j)  since  no  other  word  can  contribute  the 
symbol  TO  to  the  root  meaning.  Furthermore,  Mary  cannot  mean  TO(x)  since  the  linking  rule  would  not 
then  be  able  to  derive  the  root  meaning.  Thus  to  must  mean  TO(j  ).  Likewise,  Mary  must  mean  Mary 
since  at  this  point  no  other  word  can  contribute  the  necessary  symbol  Mary  to  the  root  meaning. 
Inverse  linking  can  then  determine  that  walked  to  the  corner  must  mean  WALKIr.TOlcorner)).  I'nder 
analysis  (g),  the  only  way  to  derive  this  meaning,  given  the  possible  meanings  for  its  constituent  words, 
is  for  walked  to  mean  WALK(x,t/)  and  comer  to  mean  corner. 

This  type  of  reasoning  has  allowed  the  learner  to  uniquely  determine  not  only  the  meanings  of  the 
words  Mary,  walked,  to,  the,  and  comer,  but  also  that  to  must  be  a  preposition  and  the  must  l)e  a 
determiner.  This  rules  out  the  third  possible  solution  to  the  CSP  problem  presented  earlier  implying 
that  a  must  be  determiner  and  from  cannot  be  a  determiner.  Furthermore,  64  must  have  analysis  (g). 
S5  cannot  have  analysis  (e),  and  neither  *1  nor  so  can  have  analysis  (d). 

Since  S4  must  have  analysis  (g),  it  must  have  the  following  structure. 


RU  N  ( Mary.  T0(  cat)) 


to 

TO(x) 


cat 


Knowing  the  meaning  of  the  root  node,  as  well  as  the  meanings  of  the  words  Mary.  to.  and  a.  allows  the 
learner  to  uniquely  determine  that  ran  must  mean  RUN(x,j/)  and  cat  must  mean  cat  since  these  are 
the  only  meanings  with  which  the  linking  rule  can  produce  the  desired  root  meaning. 

At  this  point  the  learner  can  analyze  so  in  a  fashion  similar  to  S3.  By  an  argument  analogous  to  the 
one  used  for  S3,  the  learner  can  rule  out  analysis  (d),  determining  that  only  analysis  (g)  is  consistent. 
In  doing  so,  the  learner  will  assign  the  meanings  John  to  John  and  FROM(x)  to  from.  Thus  from  must 
be  a  preposition,  sj  must  have  analysis  (g),  and  S5  must  have  analysis  (i).  At  this  point,  FLEE(x,i/)  is 
the  only  possible  meaning  for  fled  which  will  allow  sj  to  take  on  the  desired  root  meaning  consistent 
with  analysis  (g).  Finally,  by  a  similar  argument,  slid  must  mean  SLIDE(x,  [path  •»■..*/])  *md  Bill  must 
mean  Bill  since  only  these  meanings  can  let  the  linking  rule  produce  the  desired  meaning  of  S5  under 
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analysis  (i).  With  this,  the  learner  has  completely  determined  a  unique  lexicon  that  is  consistent  with 
the  corpus. 

While  this  example  is  somewhat  contrived,  it  nonetheless  illustrates  a  situation  iti  which  the  com¬ 
bination  of  syntactic  and  semantic  reasoning  is  strictly  stronger  than  either  apiditd  in  isolatioti.  It  is 
particularly  important  to  highlight  the  fact  that  syntactic  reasoning  can  help  constrain  semantic  choices 
and  vice  versa.  The  above  example  demonstrated  a  continual  interplay  between  syntax  and  semantics. 
The  central  claim  of  part  1  this  thesis  is  that  such  interplay  is  crucial  to  language  learning.  It  is  the  key 
that  can  unlock  the  quagivtire  of  the  various  bootstrapping  hypotheses  reviewed  in  section  j.l.  showing 
that  it  is  not  necessary  to  assume  prior  language-specific  knowledge  before  the  onset  of  the  primary  phase 
of  language  acquisition.  The  problem  of  infinite  regress  is  thus  avoided.  While  actual  child  language 
acquisition  could  not  proceed  according  to  the  overly  simplistic  linguistic  theory  utilized  in  this  example. 
I  conjecture  that  the  process  actually  performed  by  children  does  nonetheless  incorporate  an  interitlay 
between  syntax  and  semantics  using  cross-situational  techniques  interwoven  with  whatever  turns  out  to 
be  the  correct  linguistic  theory.  The  claim  that  children  learn  by  an  interplay  of  syntactic  and  semantic 
knowledge  is  fairly  uncontroversial.  The  claim  that  they  utilize  a  cross  situational  strategy  to  do  so 
is,  however,  a  controversial  conjecture.  The  next  chapter  attempts  to  explore  the  consequences  of  this 
conjecture  for  more  substantial  linguistic  theories. 
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Three  Implementations 


To  test  the  ideas  discussed  in  the  previous  chapter.  I  have  constructed  three  systems  that  incorporate 
these  ideas  into  working  implementations.  Each  of  these  systems  applies  cross-situational  learning  tech¬ 
niques  to  a  combination  of  both  linguistic  and  non-linguistic  input.  In  accord  with  current  hypotheses 
about  child  language  acquisition,  these  systems  use  only  positive  e.xamples  to  drive  their  acquisition  of  a 
language  model.  These  systems  differ  from  one  another  in  the  syntactic  and  semantic  theory  which  they 
use.  Maimra,*  the  first  system  constructed,  incorporates  a  fi.xed  context-free  grammar  as  its  syntactic 
theory,  and  represents  word  and  utterance  meanings  using  Jackendovian  conceptual  structures.  Maimra 
learns  both  the  syntactic  categories  and  meanings  of  words,  given  a  corpus  of  utterances  paired  with 
sets  of  possible  meanings.  Davra,-  the  second  system  constructed,  extends  the  results  obtained  wit  h 
Maimra  by  replacing  the  fixed  context-free  grammar  with  a  parameterized  version  of  X  theory.  This 
grammar  contains  two  binary-valued  parameters  which  determine  whether  the  language  is  head-initial 
or  head-final,  and  SPEC-initial  or  SPEC-final.  Given  a  corpus  much  like  that  given  to  Maimra.  Davra 
learns  not  only  a  lexicon  similar  to  that  learned  by  Maimra.  but  the  .syntactic  parameter  settings  as 
well.  Davra  has  been  successfully  applied  to  very  small  corpora  in  both  English  and  Japanese,  learning 
that  English  is  head-initial  while  Japanese  is  head-final.  Kenunia.^  the  third  system  constructed,  incor¬ 
porates  the  most  substantial  linguistic  theory  of  the  three  systems.  This  theory  closely  follows  current 
linguistic  theory  and  is  based  on  the  DP  hypothesis,  base  generation  of  VP-internal  subjects,  and  V-to-1 
movement.  Kenunia  incorporates  a  version  of  X  theory  with  sixteen  binary-valued  parameters  that 
supports  both  adjunction  as  well  as  head-complement  structures.  More  importantly,  Kenunia  supports 
movement  and  empty  categories.  Two  types  of  empty  categories  are  supported:  traces  of  movement,  and 
non-overt  words  and  morphemes.  Kenunia  incorporates  several  other  linguistic  subsystems  in  addition 
to  X  theory.  These  include  ^-theory,  the  empty  category  principle  (ECP).  and  the  ceise  filter.  The  current 
version  of  Kenunia  has  learned  both  the  parameter  settings  of  this  theory,  as  well  as  the  syntactic  cate¬ 
gories  of  words,  given  an  initial  lexicon  pairing  words  to  their  ^-grids.  Future  work  will  extend  Kenunia 
to  learn  these  ^-grids  from  the  corpus,  along  with  the  syntactic  categories  and  parameters,  instead  of 
giving  them  to  Kenunia  as  prior  input.  In  the  longer  term,  I  also  plan  to  integrate  the  language  learning 
strategies  from  Maimra.  Davra,  and  Kenunia  with  the  visual  perception  mechanisms  incorporated  in 
Abigail'*  and  discussed  in  part  II  of  this  thesis.  The  remainder  of  this  chapter  will  discuss  Maimra. 
Davra,  and  Kenunia  in  greater  detail. 

*  Maimra  or  NID’D,  is  an  Aramaic  word  which  means  word. 

^  Davra,  or  NliT.  is  an  Arammc  word  which  does  not  mesm  word. 

®  Kenunia  or  N''3Up,  is  an  Aramaic  word  which  means  conspiracy.  In  Kenunia  the  linguistic  principles  conspire  to 
enable  the  learner  to  acquire  language. 

^Abigail  is  not  tm  Aramaic  word. 


48 


CHAPTER  4.  THREE  IMPLEMENTATIOSS 


S 

s 

NP 

VP 

PP 

AUX 


npIvpI 

{COMP}[?] 

{DET}0{S|NP|VP|PP}' 

{AUX}0{S|NP|VP|PP}* 

0{S|NP|VP|PP}* 

{DO|BE|{MODALtTO|{{MODALlTO}}  HAVE}  {BE}} 


Figure  4.1;  The  context-free  grammar  used  by  Maimra  The  categories  enclosed  in  boxes  indicate 
the  heads  of  each  phrase  type.  The  distinction  between  head  and  complement  children  is  used  by 
the  linking  rule  to  form  the  meaning  of  a  phrase  out  of  the  meaning  of  its  constituents. 


4.1  Maimra 

Maimra  (Siskind  1990)  was  constructed  as  an  initial  test  of  the  feasibility  of  applying  cross-situational 
learning  techniques  to  a  combination  of  linguistic  and  non-linguistic  input  in  an  attempt  to  simul¬ 
taneously  learn  both  syntactic  and  semantic  information  about  language.  Maimra  is  given  a  fixed 
context-free  grammar  as  input:  grammar  acquisition  is  not  part  of  the  task  faced  by  Maimra.  Though 
the  grammar  is  not  hardwired  into  Maimra,  and  could  be  changed  to  attempt  acquisition  experiments 
with  different  input  grammars,  all  of  the  experiments  discussed  in  this  chapter  utilize  the  grammar  given 
in  figure  4.1.  This  grammar  was  derived  from  a  variant  of  X  theory  by  fixing  the  head-initial  and  SPEC- 
initial  parameters,  and  adding  rules  for  S,  S,  and  AUX.  Note  that  this  grammar  severely  overgenerates 
due  to  the  lack  of  subcategorization  restrictions.  The  grammar  allows  nouns,  verbs,  and  prepositions  to 
take  an  arbitrary  number  of  complements  of  any  type.  Maimra  is  nonetheless  able  to  learn  despite  the 
ensuing  ambiguity. 

Maimra  incorporates  a  semantic  theory  based  on  Jackendovian  conceptual  structures.  Words, 
phrases,  and  complete  utterances  are  assigned  fragments  of  conceptual  structure  as  their  meaning.  The 
meaning  of  a  phrase  is  derived  from  the  meanings  of  its  constituents  by  the  linking  rule  discussed  in 
sectioi.  3.1.  To  reiterate  briefly,  the  linking  rule  operates  as  follows.  The  linking  rule  is  mediated  by  a 
parse  tree.  Lexical  entries  provide  the  meanings  of  terminal  nodes.  Each  non-terminal  node  has  a  distin¬ 
guished  child  called  its  head.  The  remaining  children  are  called  the  complements  of  the  head.  Unlike  the 
puzzle  given  in  section  3.3,  the  grammar  given  to  Maimra  indicates  the  head  child  for  every  phrase  type. 
Figure  4.1  depicts  this  information  by  enclosing  the  head  of  each  phrase  with  a  box.  The  meaning  of  a 
non-terminal  is  derived  from  the  meaning  of  its  head  by  substituting  the  meaning  of  the  complements 
for  the  variables  in  the  meaning  of  the  head.  Complements  whose  meaning  is  the  distinguished  symbol  ± 
are  ignored  and  not  linked  to  a  variable  in  the  head.  Maimra  restricts  all  complement  meanings  to  be 
variable-free  so  that  no  variable  renaming  is  required. 

In  addition  to  the  grammar,  Maimra  is  given  a  corpus  of  linguistic  and  non-linguistic  input.  Fig¬ 
ure  4.2  depicts  one  such  corpus  given  to  Maimra.  This  corpus  consists  of  a  sequence  of  nine  multi-word 
utterances,  ranging  in  length  from  two  to  seven  words.  Each  utterance  is  paired  wdth  a  set  of  between 
three  and  six  possible  meanings.®  Maimra  is  not  told  which  of  the  meanings  is  the  correct  one  for  each 

^As  described  in  Siskind  (1990),  Maimrais  not  given  this  set  of  meanings  directly  but  instead  derives  this  set  from 
more  primitive  information  using  perceptual  rules.  These  rules  state,  for  instance,  that  seeing  an  object  at  one  location 
followed  by  seeing  it  later  at  a  different  location  implies  that  the  object  moved  from  the  first  location  to  the  second.  The 
corpus  actually  given  to  Maimrapairs  utterances  with  sequences  of  states  rather  than  potential  utterance  meanings.  Thus 
Maimrawould  derive  GO(x,  [path  FRC)M(y),  TO(j)])  as  a  potential  meaning  for  an  utterance  if  the  state  sequence  paired 
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utterance,  only  tliat  the  set  contains  tlie  correct  meaning  a.s  one  of  its  mi-nihers.  J  ims  ilie  cor|>us  given 
to  Maimr.\  can  exhiliit  referential  uncertainty  in  mapping  the  linguistic  to  the  non-linguist ic  input. 

Maimra  processes  the  corpus,  utterance  by  utterance,  proilucing  a  ilisjunctive  le.xicon  formula  for 
each  utterance  meaning-set  pair.  .No  information  other  than  this  lexicon  formula  is  retained  after  pro¬ 
cessing  an  utterance.  This  processing  occurs  in  two  pha.ses.  corresponding  to  the  parser  and  linker  from 
the  architecture  given  in  figure  2.1.  In  the  first  pliase.  Mai.MRa  constructs  a  disjunctive  parsi'  tree 
representing  the  set  of  all  possible  ways  of  parsing  the  input  utterance  according  to  the  given  context- 
free  grammar.  Appendix  A  illustrates  .sample  disjunctive  parse  trees  which  are  produced  by  .MaI.MRA 
when  processing  the  corpus  from  figure  4.2.  Structural  ambiguity  can  result  both  from  the  fart  that 
,the  grammar  is  ambiguous,  as  well  as  the  fact  that  Maimra  does  not  yet  have  unique  maiipings  from 
words  to  their  syntactic  categories.  Initially.  Maimra  assumes  that  each  word  can  a.ssume  any  terminal 
category.  This  introduces  substantial  lexical  ambiguity  and  results  in  corresponding  structural  ambigu¬ 
ity.  As  Mai.MRA  further  constrains  the  lexicon,  she  can  rule  out  some  word-to-category  mappings  and 
thus  reduce  the  lexical  ambiguity  when  processing  subsequent  utterances.  Thus  parse  trees  tend  to  have 
less  ambiguity  as  Maimra  processes  more  utterances.  Tliis  is  evident  in  the  parse  trees  depicted  on 
pages  210  and  213  which  are  also  illustrated  below.  When  Maimra  first  parses  the  utterance  Bill  ran  to 
Mary,  the  syntactic  category  of  ran  is  not  yet  fully  determined.  Thus  .Maimra  produces  the  following 
disjunctive  parse  tree  for  this  utterance. 

(OR  (S  (OR  (NP  (N  BILL)  (HP  (H  RAH))) 

(HP  (H  BILL)  (VP  (V  RAH))) 

(HP  (H  BILL)  (PP  (P  RAH)))) 

(VP  (V  TO)  (HP  (H  MARY)))) 

(S  (HP  (H  BILL)) 

(OR  (VP  (V  RAH)  (PP  (P  TO))  (HP  (H  MARY))) 

(VP  (V  RAH)  (VP  (V  TO))  (HP  (H  MARY))) 

(VP  (V  RAH)  (HP  (H  TO))  (HP  (H  MARY))) 

(VP  (OR  (AUX  (DO  RAH)) 

(AUX  (BE  RAH)) 

(AUX  (MODAL  RAH)) 

(AUX  (TO  RAH)) 

(AUX  (HAVE  RAH))) 

(V  TO) 

(HP  (H  MARY))) 

(VP  (V  RAH) 

(OR  (HP  (DET  TO)  (H  MARY)) 

(HP  (H  TO)  (HP  (H  MARY))))) 

(VP  (V  RAH)  (VP  (V  TO)  (HP  (H  MARY)))) 

(VP  (V  RAH)  (PP  (P  TO)  (HP  (H  MARY))))))) 

As  a  result  of  processing  that  utterance,  in  conjunction  with  the  constraint  provided  by  prior  utterances. 
Maimra  can  determine  that  ran  must  be  a  verb.  Thus  when  parsing  the  subsequent  utterance  Bill  ran 
from  Mary,  which  nominally  has  the  same  structure,  Maimra  can  nonetheless  produce  the  following 
smaller  disjunctive  parse  tree  by  taking  into  account  partial  information  acquired  so  far. 

with  that  utterance  conteiined  a  state  in  which  BE(r.  AT(j/))  was  true,  followed  later  by  a  state  where  BE(j’.  AT(c))  was 
true.  This  primitive  theory  of  event  perception  is  grossly  inadequate  and  largely  irrelevant  to  the  remainder  of  the  learning 
strategy.  For  the  purposes  of  this  chapter.  Maiflirals  perceptual  rules  can  be  ignored  and  the  input  to  Maimraviewed 
as  comprising  a  set  of  potential  meanings  asso<'iated  with  each  utterance.  The  ultimate  goal  is  to  l)ase  future  language 
acquisition  models  on  the  theory  of  event  perception  put  forth  in  part  11  of  this  thesis,  instead  of  the  simplistic  rules  used 

by  Maimra 
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BE(persoUi,  AT(persou3))  V  BE(persoU| .  AT(persou-_,))V 
GO(per8oui ,  [path  ])  V  GO(person, .  FROM(persou3))\/ 

GO(persoUi , TO(person.,))  V  GO(person, ,  [path  FROM(persou3).TO(person.,)]) 

John  rolled. 

BEIpersono,  AT(person3))  V  BE(persou-,.  AT(persou,  ))V 
GOlpersonv,  [path  ])  V  GO(person2.  FROM(persou3))\/ 

GO(person2.  TO(personi ))  V  GO(per8on2,  [path  FROM(persoii3).  TO(persou, )]) 

Mary  rolled. 

BE(persoii3,  AT(persouj))  V  BE(person3.  AT(person2))V 
GO(person3.  [path  ])  V  GO(person3.  FROM(persoui  ))V 
GO(person3. TO(person2 ))  V GO(person3,  [path  FROM(persou, ).  TO(persou2 )]) 

Bill  rolled. 

BE(objecti,  AT(person, ))  V  BE(object, ,  AT(person2))V 
GO(objecti,  [path  ])  GO(objectj.  FROM{person]  ))V 
GO( object j ,  TO(person2))  V  GO(object,.  [path  FROM(persoii, ), TO(person2)]) 

The  cup  rolled. 

BE(person3,  AT(person, ))  V  BE(person3,  AT(persou2))V 
GO(person3,  [path  ])  V  GO(person3,  FROM(personi  ))V 
GO( persona,  TO(person2))  GO(person3.  [path  FROM ( person i ).  TO(person2)]) 
_ Bill  ran  to  Mary. _ 

BE( persona,  AT(personi))  V  BE(person3,  AT(person2))V 
GO(person3,  [path  ])  V  GO(person3,  FROM(personi  ))V 
GO(person3,  TO(person2))  V  GO(person3,  [path  FROM(person) ),  TO(person2)]) 
_ Bill  ran  from  John. _ 

BE(person3,  AT(personj ))  V  BE(person3,  AT(objecti  ))V 
GOlpersona,  [path  ])  V  GO(person3,  FROM(person,  ))V 
GO(person3,TO(objecti ))  V  GO(person3,  [path  FROM(personi  ),TO(objecti )]) 
_ Bill  ran  to  the  cup. _ 

BEIobjectj ,  AT(personi ))  V  BE(obJect],  AT(person2))V 
GO(objecti,  [path  ])  V  GO(object,,  FROM(person,  ))V 
GO(object, , TO( persona))  V  GO(object, ,  [path  FROM(person] ),TO( persona)]) 

_ The  cup  slid  from  John  to  Mary. _ 

ORIENT(personi ,  TO(person2 )  )V 
O  RI E  NT(  persona ,  TO(  persona ) ) 
ORIENT(person3,TO(personi )) 

John  faced  Mary. _ 


Figure  4,2:  A  sample  corpus  presented  to  both  Maimraand  Davra.  The  corpus  exhibits  referential 
uncertainty  in  that  each  utterance  is  paired  with  several  possible  meanings.  Neither  Maimranor 
Davra  are  told  which  is  the  correct  meaning,  only  that  one  of  the  meanings  is  correct. 
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(OR  (S  (IP  (■  BILL)  (VP  (V  RAM)))  (VP  (V  FROM)  (IP  (M  JOHI)))) 
(S  (IP  (I  BILL)) 

(OR  (VP  (V  RID  (PP  (P  FROM))  (IP  (M  JOHI))) 

(VP  (V  RAD  (VP  (V  FROM))  (IP  (I  JOHI))) 

(VP  (V  RAD  (IP  (I  FROM))  (IP  (I  JOHI))) 

(VP  (OR  (AUX  (DO  RAD) 

(AUX  (BE  RAD) 

(AUX  (MODAL  RAD) 

(AUX  (TO  RAD) 

(AUX  (HAVE  RAD)) 

(V  FROM) 

(IP  (I  JOHI))) 

(VP  (V  RAD 

(OR  (IP  (DET  FROM)  (I  JOHI)) 

(IP  (I  FROM)  (IP  (I  JOHI))))) 

(VP  (V  RAI)  (VP  (V  FROM)  (IP  (I  JOHI)))) 

(VP  (V  RAD  (PP  (P  FROM)  (IP  (I  JOHI))))))) 


Maimra  uses  a  derivative  of  the  CKY  parsing  algorithm  (Kasami  1965.  Younger  1967)  to  |)roduce 
the  disjunctive  parse  tree.  Thus  the  size  of  disjunctive  parse  tree  will  always  he  polynomial  in  the  length 
of  the  input.  The  resulting  tree  may  appear  larger  when  printed  since  a  given  entry  from  the  well- 
formed  substring  table  may  be  a  constituent  of  several  other  entries  and  thus  may  be  printed  multiple 
times.  Nonetheless,  the  internal  representation  of  the  parse  tree  is  factored  to  retain  its  polynomial 
size.  This  factored  representation  stores  only  a  single  copy  of  each  subtree  in  the  disjunctive  parse  tree, 
even  though  that  subtree  may  be  referenced  multiple  times.  Furthermore,  the  fracturing  process,  to  be 
described  shortly,  preserves  the  factored  representation  so  that  the  resulting  disjunctive  lexicon  formulae 
are  kept  to  a  manageable  size. 

After  constructing  the  disjunctive  parse  tree  for  an  input  utterance.  Maimra  applies  the  linking  rule 
in  reverse  to  produce  a  disjunctive  lexicon  formula.  This  second  phase  is  a  variant  of  the  fracturing 
procedure  described  in  section  3.1.  Recall  that  the  fracturing  procedure  recursively  applies  to  two  ar¬ 
guments:  a  parse  tree  fragment  and  a  meaning  expression  fragment.  For  the  base  case,  when  the  parse 
tree  fragment  consists  of  a  terminal  node,  a  lexical  entry  proposition  is  formed,  pairing  the  W'ord  ausso- 
ciated  with  that  node  with  the  syntactic  category  labeling  that  node  and  the  input  meaning  expression 
fragment.  For  example,  fracturing  the  parse  tree  fragment  (p  to)  with  the  meaning  expression  frag¬ 
ment  (from  ?0)  would  produce  the  lexical  entry  proposition  (definition  to  p  (from  ?0)).  For  the 
inductive  case.  Maimra  forms  all  possible  ways  of  assigning  subexpressions  of  the  meaning  expression 
fragment  as  the  meaning  of  each  complement  constituent  of  the  parse  tree  fragment.  Main:ra  then 
replaces  those  subexpressions  in  the  original  meaning  expression  fragment  wdth  variables,  and  assigns 
the  resulting  meaning  expression  fragment  to  the  head  constituent  of  the  parse  tree  fragment.  Each  con¬ 
stituent  of  the  parse  tree  fragment  is  then  recursively  fractured  with  its  associated  meaning  expression 
fragment  to  yield  a  disjunctive  lexicon  formula.  For  each  possible  subexpression  assignment.  Maimra 
forms  a  conjunction  of  the  lexicon  formulae  returned  for  each  constituent.  Maimra  then  forms  a  disjunc¬ 
tion  of  these  conjunctions.  Thus  the  recursive  fracturing  process  produces  a  formula  with  alternating 
layers  of  disjunction  and  conjunction. 

This  process  of  constructing  a  disjunctive  lexicon  formula  is  best  illustrated  by  w'ay  of  an  example. 
Consider  fracturing  the  following  parse  tree: 
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along  with  the  meaning  expression  ORIE\T( John, TO\VARD(Mary)).  Tliis  meaning  expression  lia> 
four  subexpressions,  namely  ±,  John,  Mary,  and  TO\VARD(Mary).  Each  of  these  can  be  assigned 
as  a  potential  meaning  for  John.  Thus,  the  following  reduction  illustrates  the  first  step  in  producing  a 
disjunctive  lexicon  formula,*’ 

fracture! ./oAh  faced  J/arj, ORlENT(John,TOVVARD(Mary))) 

u 

(i)  (./oAti=  ±  a f racture(/arcd  A/arj(, ORIENT! John, TO\VARD(Mary))))V 
!ii)  {.John  =  John  A  f racture!/acf</  A/flrj,ORIEN'T!j  ,TO\VARD!Mary))))V 
!iii)  (John  =  Mary  A  fracture!/acf(/  A/ar(^,ORlENT!John, TOWARD!  j‘))))V 
!iv)  (John  =  TOVVARD!Mary )  A  fracture!/acf(/  A/ar^.  ORIENT!  John,  r))) 

In  case  !i),  when  John  is  assigned  J.  as  its  meaning,  Mary  can  then  obviously  take  on  as  its  meaning 
any  of  the  four  subexpressions  of  ORIENT!John,TOWARD!Mary)). 

fracture!/acerf  Mary,  ORIENT!John,TO\VARD!Mary))) 

(Mary=  J.  A  faced=  ORIENT!John,TO\VARD!Mary)))V 
(Mary=  John  A  faced  ■=  ORlENT!j,TOVVARD!Mary)l)V 
(Mary  =  Mary  A  faced  =  ORIENT!John, TOWARD!  j)))V 
!A/arj,=  TOWARD!Mary)A/«cfrf=  ORIENT!  John,  j)) 

In  case  !ii),  when  John  is  assigned  John  as  its  meaning,  Mary  can  take  on  three  possible  meanings, 

fracture!/acf</  Mary,  ORIENTlr, TOWARD! Mary))) 

If 

(Mary  —  1.  f\  faced  —  ORIENT!^, TOVVARD!Mary)))V 
(Mary=  Mary  A /ace(/=  ORIENT! r, TOWARD!y)))V 
(Mary  =  TOWARDfMary)  A  faced  =  ORIENT!x,  t/)) 

In  case  !iii),  when  John  is  assigned  Mary  as  its  meaning,  Mary  can  take  on  two  possible  meanings. 

f racture!/acerf  Mary.  ORIENT! John,  TOWARD! x ) ) ) 

If 

(Mary  =  ±  A  faced  =  ORIENT!  John,  TOWARD!x)))V 
(Mary  =  John  A  faced  =  ORIENT! x,  TOWARD! y))) 

®The  2istute  reader  may  wonder  why  a  fifth  possibility  is  not  considered  where  the  entire  expres¬ 
sion  ORIENT! John,  TOWARD!Mary))  is  associated  with  John  and  the  meaning  of  faced  Mary  is  taken  to  be  simply 
the  variable  x.  Maimraadopts  an  additional  restriction  that  does  not  allow  a  head  to  take  on  a  meaning  that  is  simply  a 
variable,  thus  ruling  out  this  fifth  possibility.  This  restriction  can  be  interpreted  as  stating  that  every  head  must  contribute 
some  semantic  content  to  the  meaning  of  its  parent  phrase.  The  motivation  for  this  restriction  is  simply  computational 
efficiency.  Adopting  this  restriction  reduces  the  ambiguity  introduced  during  the  fr2ictiu'ing  process.  The  downside  of  tliis 
restriction  is  that  it  rules  out  the  standard  analysis  of  the  preposition  of.  In  this  imalysis.  of  it  is  treated  simply  as  a  case 
marker  such  that  the  meaning  of  the  phrase  of  NP  would  be  taken  to  be  the  same  as  the  meaning  of  the  NP.  This  requires 
taking  the  meaning  of  of  to  be  simply  the  Vciriable  x,  in  contradiction  to  the  above  restriction. 
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In  case  (iv),  when  John  is  assigned  TO\VARD(r»i.’»jfy)  as  its  meaning.  .Mary  can  also  take  on  two  po.ssihli 
meanings. 

lracture(/rt(K/  .Mary.  ORIt'.NT(  Johu.  j  )) 

(.Mary  =  ±  A  factd  —  ORIENT! John,  j  ))V 
(Mary=^  John  A  factd  =  ORIENT! j  .  i/)) 

Putting  this  all  together  yields  the  following  disjunctive  lexicon  formula. 

(or  (and  .John  =  J. 

(or  (and  Mary  —  J. 

factd  =  ORIENT!  John.  TOWARD!  Mary))) 

(and  Mary  —  John 

/arfrf=  ORIENT!j-.TOWARD!Mary))) 

(and  Mary  =  Mary 

/are(/=  ORIENT!  John.  TOWARD!  j))) 

( and  Ma  ry  =  TO WA  R  D!  Mary ) 
faced  =  ORIENT!  John.  •»)))) 

(and  John  =  John 

(or  (and  Mary  =  T 

faced  =  ORIENT!^.  TOWARD!  Mary))) 

(and  Mary  =  Mary 

/acf^/ =  ORIENT(r,TOWARD! .(/))) 

(and  Mary  =  TOWARD! Mary) 

/acf</=  ORIENT!  j-,.v)))) 

(and  John  =  Mary 

(or  (and  Mary  =  J. 

faced  =  ORI  ENT!  John,  TO  WA  R  D!  j-  ) ) ) 

(and  Mary  =  John 

/acfrf=ORIENT(j.TOWARD!.t/))))) 

(euid  John  =  TOWARD! Mary) 

(or  (and  Mary  =  T 

faced  =  ORIENT!  John,  £)) 

(and  Mary  =  John 

/fleet/ =ORIENT!r..t/))))) 


The  fracturing  procedure  actually  used  by  Maimra  is  slightly  more  complex  than  the  above  pro¬ 
cedure,  in  two  ways.  First,  it  is  extended  to  accept  disjunctive  parse  trees.  Fracturing  a  disjunctive 
parse  tree  fragment  with  a  meaning  expression  fragment  is  simply  the  disjunction  of  the  result  of  frac¬ 
turing  each  disjunct  in  the  disjunctive  parse  tree  fragment  w'ith  the  same  meaning  expression  fragment. 
Maimra  memoizes  recursive  calls  to  fracture  to  mirror  the  factored  nature  of  the  disjunctive  parse 
tree  in  the  resulting  disjunctive  lexicon  formula.'  Second,  recall  that  to  handle  referential  uncertainty, 
each  input  utterance  is  associated  with  a  sef  of  meaning  expressions.  Maimra  fractures  each  meaning 
expression  for  the  current  utterance  with  the  same  disjunctive  parse  tree  for  this  utterance  to  produce  a 
disjunctive  lexicon  formula.  A  disjunction  is  formed  from  these  formulae  to  yield  the  aggregate  lexicon 
formula  for  the  input  utterance. 

'  Memoization  eliminates  multiple  evaluations  of  a  function  called  with  the  same  argimients.  The  first  time  / ( j'l . ) 

is  called,  the  function  is  e\aluated  and  the  result  stored  in  a  table.  Subsequent  calls  to  /  with  with  the  same  argu¬ 
ments  xj retrieve  this  result  from  the  table  instead  of  reevaluating /(ri jt,,).  An  additional  benefit  of  menu »- 

ization  is  that  multiple  evaluations  of  a  function  called  with  the  same  arguments  return  pointers  tc  the  same  ropy  <if  the 
result  thus  creating  a  factored  representation. 
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John: 

[N] 

Mary: 

[N] 

Bill: 

[N] 

cup: 

[N] 

the: 

[Nspec] 

lolled: 

[V] 

ran: 

[V] 

slid: 

[V] 

faced: 

[V] 

from: 

[N,V.P] 

io: 

[N.V.P] 

person  j 
person., 
persona 
object, 

1 

(iO(r.  [path  ]) 
OOir.y) 

GO(j-.  [path  </.:]) 

ORIENT(j-.TO((/)) 

FROM{j-) 

TO(i) 


Figure  4.3:  The  lexicon  inferred  by  Maimrafor  the  corpus  from  figure  4.2.  Note  that  Maimraha.s 
converged  to  a  unique  word-to-meaning  mapping  for  each  word  in  the  corpus,  as  well  as  a  unique 
word-to-category  mapping  for  all  but  two  words. 


Appendix  A  illustrates  the  series  of  disjunctive  parse  trees  and  disjunctive  lexicon  formulae  produced 
by  Maimra  when  processing  the  corpus  from  figure  4.2.  Each  lexicon  formula  produced  corresponds 
to  a  single  input  utterance.  MaimRA  determines  the  lexicon  corresponding  tc  the  corpus  by  forming  a 
conjunction  of  these  lexicon  formulae,  conjoining  this  with  a  conjunction  of  monosemy  formulae  to  imple¬ 
ment  the  monosemy  constraint,  and  finding  satisfying  truth  assignments  to  the  lexical  entry  propositions 
in  the  entire  resulting  formula.  Maimra  actually  performs  this  process  repeatedly  as  each  new  utterance 
arrives.  Even  though  there  may  be  multiple  consistent  lexica  during  intermediate  stages  when  only  part 
of  the  corpus  has  been  processed,  nonetheless  it  may  be  possible  to  rule  out  some  word-to-category  or 
word-to-meaning  mappings.  Maimra  can  use  this  partial  information  to  reduce  the  size  of  structures 
produced  when  processing  subsequent  input  utterances.  I  have  already  discussed  how  reduced  lexical 
ambiguity  can  result  in  smaller  disjunctive  parse  trees.  Furthermore,  reduced  structural  ambiguity, 
combined  with  ruling  out  impossible  word-to-meaning  mappings,  can  result  in  the  production  of  smaller 
disjunctive  lexicon  formulae.  This  is  evident  when  comparing  the  lexicon  formula  corresponding  to  Bill 
ran  io  Mary  on  page  211  with  the  lexicon  formula  corresponding  to  Bill  ran  from  John  on  page  214. 
Though  the  input  utterances  are  similar,  and  are  paired  with  analogous  meaning  expressions,  the  latter 
utterance  yields  a  smaller  disjunctive  lexicon  formula  due  to  the  knowledge  gleaned  from  prior  input. 

Using  the  above  techniques,  Maimra  can  successfully  derive  the  lexicon  shown  in  figure  4.3  from 
the  corpus  given  in  figure  4.2.  Inferring  this  lexicon  requires  several  minutes  of  elap.sed  time  on  a 
Symbolics  XL  1200^^  computer.  Thus  Maimra  converges  to  a  unique  and  correct  meaning  for  every 
word  in  the  corpus  cis  well  as  a  unique  and  correct  syntactic  category  for  all  but  two  of  the  words  in  the 
corpus. 

From  a  theoretical  perspective,  the  lexicon  produced  by  Maimra  is  independent  of  the  order  in 
which  the  corpus  is  processed.  This  is  because  each  utterance  in  the  corpus  is  processed  to  yield  a 
lexicon  formula  which  characterizes  those  lexica  that  are  consistent  with  that  utterance.  Maimra 
simply  conjoins  those  formulae  to  find  a  lexicon  consistent  with  the  entire  corpus.  As  a  practical  matter, 
however,  the  computational  complexity  of  the  learning  algorithm  is  affected  by  the  processing  order, 
since  Maimra  uses  previously  acquired  knowledge  to  reduce  the  size  of  subsequently  generated  lexicon 
formulae.  Maimra  works  best  if  the  corpus  is  ordered  so  that  shorter  utterances  and  utterances  with 
fewer  unknown  words  appear  first. 
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Despite  iMaIMRA's  success  in  inferring  a  lexicon  from  semantically  annotated  input  utterances,  the  t  lieory 
underlying  Maimra  suffers  from  two  severe  limitations  that  preclude  it  from  being  a  complete  account 
of  child  language  acquisition.  First,  MaIMRA  relies  on  a  fixed  context-free  grammar  being  available  prior 
to  the  lexicon  acquisition  process.  It  appears  unrea.sonable  to  assume  that  children  know  the  grammar 
of  their  native  language  before  they  learn  the  syntactic  categories  or  meanings  of  any  words.  More  likely, 
they  must  learn  the  grammar  either  along  with,  or  subsequent  to,  the  lexicon.  Second.  Mai.MRA  lias 
been  tested  only  on  an  English  corpus.  A  satisfyitig  theory  of  language  acquisition  must  be  capable  of 
acquiring  any  human  language,  not  just  English. 

In  attempt  to  rectify  the  above  two  shortcomings,  a  second  system  called  Davra  (Siskind  1991) 
was  constructed.  Davra  is  very  similar  to  Maimra  in  many  ways.  Both  represent  word,  phrase,  and 
utterance  meanings  using  the  same  form  of  Jackendovian  conceptual  structure  meaning  expressions. 
Furthermore,  both  receive  input  in  the  same  form;  a  corpus  of  utterances,  each  paired  with  a  set  of 
potential  meanings  for  that  utterance.  Thus  Davra,  like  Maimra.  learns  in  the  presence  of  referential 
uncertainty.  Davra  differs  from  Maimra  however,  in  basing  its  syntactic  theory  on  a  parameterized 
version  of  X  theory  rather  than  on  a  fixed  context-free  grammar  given  as  input  to  the  learner.  Davra  s 
innate  endowment  includes  the  formulation  of  X  theory,  embodied  in  the  acquisition  model,  but  does  not 
include  the  parameter  settings  particular  to  the  language  being  learned.  Davra  acquires  the  parameter 
settings  from  the  corpus,  simultaneously  with  the  lexicon,  using  the  cross-situational  learning  architec¬ 
ture  described  in  section  2.1.  Thus  Davra  learns  three  things — parameter  settings,  word-to-category 
mappings,  and  word- to- meaning  mappings — without  any  prior  knowledge  of  such  parameter  settings  or 
mappings. 

The  variant  of  X  theory  incorporated  into  Davra  can  be  summarized  as  follows. 

1.  The  syntactic  structures  constructed  by  Davra  are  binary  branching.  Each  node  has  zero.  one. 
or  two  children.  Nodes  with  no  children  are  fermina/s.  Nodes  with  one  or  two  children  are 
head-compJement  structures.  One  child  of  a  head-complement  structure  is  always  the  head.  The 
remaining  child,  if  present,  is  its  complement. 

2.  Davra  labels  each  node  with  one  of  the  category  labels  X.  Xspec-**  -X,  or  XP.  wliere  X  is  one  of 
the  bcise  categories  N,  V,  P,  or  I. 


3.  Terminals  must  labeled  with  either  XgPEC  or  X  for  some  base  category  X. 

4.  Non-terminal  nodes  take  on  one  of  the  following  five  configurations 


(a)  (b) 


(c)  (d)  (e) 


where  X  and  Y  freely  range  over  the  base  categories.  The  nodes  enclosed  in  boxes  indicate  which 
child  is  taken  to  be  the  head  of  a  head-complement  structure  as  far  as  the  linking  rule  is  concerned. 

*The  linguistic  literature  has  waffled  somewhat  over  the  term  SPEC,  sometimes  considering  it  t  o  be  a  categorr  label,  or 
class  of  category  labels  such  as  determiner,  and  other  times  tedung  it  to  be  the  name  of  a  position,  where  determiners,  among 
other  things,  can  appear.  Davra  takes  Xgpgc-  to  ^  a  class  of  category  labels — NsPEC'  instance,  being  a  synonym 
for  DET.  This  is  a  somewhat  outdated  approach  to  X  theory.  In  contrtist,  Kenuniatttkes  SPEC  to  be  a  position,  namely 
the  non-adjunct  sister  to  a  node  of  bar-level  one.  This  approetch  is  more  in  line  with  the  variant  of  X  theory  presented  in 
Chomsky  (198-5).  Davra  should  not  be  considered  a  priori  incorrect  because  of  this.  Many  current  authors  still  adopt  the 
former  position  (cf.  Lightfoot  1991  pp.  186-187). 
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A  given  language  will  allow  only  a  subset  of  the  above  five  structures  however.  One  binary-valued 
parameter  det*^rmines  whether  the  language  is  SPEC-initial  or  SPEC-final.  Structure  (c)  is  not 
allowed  if  the  language  is  SPEC’-initial,  while  structure  (b)  is  not  allowed  if  the  language  is  SPL( 
final.  A  second  binary-valued  parameter  determines  whether  the  language  is  head-initial  or  head- 
final.  Structure  (e)  is  not  allowed  if  the  language  is  head-initial,  while  structure  (d)  is  not  allowed 
if  the  language  is  head-final. 

5.  The  top-level  node  corresponding  to  an  input  utterance  ntust  be  labeled  U*. 

6.  The  category  label  Ispec’  is  taken  to  be  a  synonym  for  the  category  label  NP. 

7.  The  category  label  T  is  taken  to  be  a  synonym  for  the  category  label  \T. 

In  addition  to  the  above  variant  of  X  theory,  DaVRA  incorporates  the  linking  rule  given  in  section  3.1. 
This  linking  rule  is  simplified  in  Davra  since,  unlike  MaIMRa's  syntactic  theory.  Davra's  syntactic  the¬ 
ory  allows  only  binary  branching  structures.^  Furthermore,  like  Maimra.  DAVR.t  adopts  two  additional 
restrictions  First,  the  meaning  expressions  associated  with  complements  must  be  variable-free.  This 
eliminates  the  need  to  rename  variables  during  the  linking  process.  Second,  the  meaning  expression  as¬ 
sociated  with  a  head  must  not  be  simply  a  variable.  With  these  restrictions,  the  linking  rule  incorporated 
into  Davra  can  be  summarized  by  the  following  five  cases 


o  off)  o  o 


(i)  (ii)  (iii)  (iv)  (v) 

where  the  nodes  enclosed  in  boxes  indicate  the  heads  of  head-complement  structures.  Case  (i)  is  used  for 
unary  branching  structures  of  type  (a).  Both  cases  (ii)  and  (iv)  apply  to  SPEC-final  structures  like  (c) 
and  head-initial  structures  like  (d),  while  both  cases  (iii)  and  (v)  apply  to  SPEC-initial  structures  like  (b) 
and  head-final  structures  like  (e).  For  example,  in  English,  a  head-initial  language,  case  (ii)  would  be  used 
to  derive  the  meaning  of  from  John,  namely  FROM(John),  from  FROM(j')  and  John,  the  tneanings 
of  from  and  John  respectively.  Likewise,  case  (v)  would  he  used  to  derive  the  meaning  of  ihf  book. 
namely  book,  from  ±  and  book,  the  meanings  of  thf  and  book  respectively.  In  Japanese,  a  head-final 
language,  case  (ii)  would  be  used  to  derive  the  meaning  of  Taro  kara.  namely  FROM(Taro),  from  Taro 
and  FROM(a;).  the  meanings  of  Taro  and  kara  respectively. 

The  nodes  in  the  syntactic  tree  constructed  by  Davra  correspond  to  substrings  of  the  input  utter¬ 
ance  in  the  standard  fashion  that  disallows  crossovers.  Davra  allows  non-overt  nodes,  i.e.  nodes  that 
correspond  to  empty  substrings.  Both  terminal  and  non-terminals  nodes  may  be  non-overt.  Davra 
enforces  the  constraint  that  overt  terminal  nodes  correspond  to  a  single  word  of  the  input  utterance. 
Furthermore,  Davra  enforces  several  additional  constraints  designed  to  reduce  the  size  of  the  search 
space  in  the  underlying  language  acquisition  task.  First,  nodes  labeled  X  must  be  overt.  Second,  non- 
overt  nodes  must  be  assigned  X  as  their  meaning.  Stated  informally,  this  means  that  non-overt  phra.ses 
cannot  contribute  substantive  semantic  content  to  an  utterance.  Finally,  any  node  labeled  XP  cannot 
be  assigned  X  as  its  meaning. 

For  reasons  of  simplicity,  Davra  does  not  generate  disjunctive  lexicon  formulae  the  w’ay  Maimra 
does.  Instead,  the  design  of  Davra  directly  follows  the  architecture  from  figure  2.2.  Davra  retains  the 
entire  corpus  in  memory  and  tries  to  find  a  lexicon  and  a  set  of  parameter  settings  that  are  consistent 
across  this  corpus.  Davra  employs  straightforw'ard  blind  search  to  find  this  lexicon  and  set  of  parameter 

®  Restricting  the  linking  rule  to  binary  branching  structures  is  not  a  .severe  limitation.  Most  current  variants  of  X  tlieory 
adopt  the  binary  branching  restriction  as  it  appears  to  be  sufficient  to  describe  the  requisite  syntactic  phenomena. 
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settings.  The  motivation  behind  tlie  design  of  Davra  was  not  tlie  construction  of  an  accurate  process 
model  of  child  language  acquisition.  Davra  s  use  of  blind  search  over  a  corpus  retained  in  memory  is  not 
a  plausible  process  model.  It  does,  however,  allow  one  to  determine  whether  a  linguistic  theory  of  the 
form  described  above,  namely  parameterized  X  theory,  offers  enough  constraint  to  uni(|uely  determine 
the  lexicon  and  parameter  settings  when  supplied  with  a  very  small  corpus.  Onl\  once  it  iias  been 
determined  that  the  theory  is  sufficiently  cojistraining  does  it  make  sense  to  ex|)lore  more  efficient  and 
plausible  search  algorithms. 

The  linguistic  theory  incoritorated  in  Davra  can  be  phrased  as  a  simple  nondeterministic  program 
that  describes  the  search  space  for  possible  lexica  and  parameter  settings.  This  program,  which  1  will 
call  fracture,  operates  in  a  top-down  divide-and-conquer  fashion  where  nondeterministic  choices  are 
made  at  each  divide-and-conquer  step.  Backtracking  through  these  nondeterministic  choices  allows 
straightforward  though  inefficient  search  for  possible  solutions.  The  divide-and-conquer  steps  interleave 
a  top-down  parsing  strategy  with  the  fracturing  procedure  discus.sed  in  section  3.1. 

One  such  nondeterministic  path  through  the  divide-and-conquer  sequence  is  illustrated  in  figure  4.4. 
For  each  divide-and-conquer  step,  fracture  is  called  with  three  arguments:  a  phrase,  a  meaning  expres¬ 
sion  to  be  associated  with  that  phrase,  and  a  category  label  for  that  phrase.  At  the  toj)  h'vel.  fracture 
is  called  w’ith  an  input  utterance  paired  nondeterministically  with  one  of  its  possible  meanings.  The 
input  utterance  is  labeled  with  the  category  IP. 

Several  nondeterministic  choices  are  made  at  each  recursive  call  to  fracture.  First,  the  phrase  is 
split  into  two  subphrases.  For  example,  the  input  phrase  Th(  cup  slid  from  John  to  Mary  might  be 
split  into  the  subphrases  The  cup  and  slid  from  John  to  Mary.  The  split  point  is  chosen  nondeter¬ 
ministically.  Second,  the  SPEC-initial  parameter  is  nondeterministically  set  to  true.  This  allows  the 
first  subphrase  to  be  assigned  the  category  Ispec-  which  is  treated  as  NP.  atid  the  second  subphra.se 
to  be  eissigned  the  category  I,  which  is  treated  as  VP.  Since  T  is  the  head  of  IP.  some  suliexpression 
of  G0( cup.  [path  FROMf  John). T0( Mary)])  is  nondeterministically  selected,  namely  cup.  and  asso¬ 
ciated  with  the  first  subphrase,  as  this  subphrase  is  the  complement.  The  subexpression  cup  is  then 
extracted  from  GO(cup,  [path  FROM(John),TO(Maryj]).  leaving  a  variable  behind,  to  yield  the  ex¬ 
pression  GO(x.  [path  FROM(John),TO(Mary)]).  This  meaning  expression  fragment  is  then  a.ssigned  to 
the  head  subphrcise.  The  fracture  routine  is  then  recursively  called  on  each  of  the  two  subphra.ses  with 
their  associated  meaning  expression  fragments  and  category  laoels.  This  recursive  process  terminates 
when  fracture  is  called  on  a  singleton  word.  In  this  case,  a  lexical  entry  is  created  mapping  the  word 
to  the  given  meaning  expression  and  .syntactic  c.  i  gory  label.  Figure  4.4  illustrates  t  wo  such  mapiiings: 
one  from  the  word  the  to  the  category  label  Nspec  and  meaning  expression  ±.  and  one  from  the  word 
cup  to  the  category  label  N  and  meaning  expression  cup. 

The  fracture  routine  makes  many  nondeterministic  choices  at  each  step.  For  pedagogical  purpo.ses. 
figure  4.4  illustrates  a  path  containing  the  correct  choices,  though  many  alternative  paths  contain  incor¬ 
rect  choices  that  are  filtered  out  by  backtracking.  Backtracking  is  initiated  by  two  types  of  failure.  One 
type  occurs  when  an  attempt  is  made  to  set  a  parameter  to  a  different  setting  than  has  already  been 
made.  The  linguistic  theory  incorporated  into  Davra  states  that  a  given  language  is  either  head-initial 
or  head-final  but  not  both.  The  second  type  occurs  wdien  an  attempt  is  made  to  create  a  lexical  entry 
for  a  word  which  assigns  it  a  different  meaning  or  syntactic  category  than  it  has  already  been  assigned. 
This  is  an  embodiment  of  the  monosemy  constraint. 

The  nondeterministic  search  process  just  described  can  be  written  as  a  program  in  nondeterministic 
LlSP(Siskind  and  McAllester  1992).  This  program  is  really  quite  small  and  modular.  An  annotated 
description  of  the  essential  routines  in  this  program  is  given  below.  It  can  be  seen  that  this  program 
straightforwardly  embodies  the  linguistic  theory  .stated  above. 

(defun  fracture  (words  category  meaning) 

(declaure  (special  categories  head-initial?  spec-initial?  lexicon)) 
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Thf  cup  slid  from  John  to  Mary. 
GO(cup,  [path  FROM( John).  TO( Mary)]) 
IP 


The  cup  sltd  from  John  to  Mary 

cup  GO(x,[path  FROM(John),TO(Mary)]) 
NP  VP 


Fracture 


The  cup 
±  cup 
Nspec  N 


Fracture 


Figure  4.4:  Davra  incorporates  a  divide-and-conquer  search  strategy  illustrated  by  this  figure.  This 
process  is  embodied  in  a  recursive  routine  called  fracture  which  takes  three  arguments:  a  phrase, 
a  meaning  expression  fragment,  and  a  category  label.  First,  the  phrase  is  nondeterministically  split 
into  two  subphrases.  Next,  the  meaning  expression  fragment  is  nondeterministically  split  into  two 
submeanings,  one  to  be  assigned  to  each  subphrase.  Finally.  X  theory  determines  the  category 
labels  to  assign  to  each  subphrase  given  the  input  category  label.  Each  subphrase  is  then  recursively 
fractured  with  its  associated  submeaning  and  category  label.  The  recursion  terminates  when  a  single 
word  is  assigned  a  category  and  meaning.  There  may  be  many  possible  divide-and-conquer  paths  due 
to  nondeterminism.  This  figure  illustrates  just  a  portion  of  one  such  path,  the  correct  one.  Davra 
enumerates  all  possible  divide-and-conquer  paths  to  find  those  that  contain  consistent  parameter 
settings,  as  well  as  consistent  word-to-category  and  word-to-meaning  mappings,  across  the  entire 
corpus. 
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Til**  essence  of  Da\'RA  is  the  routine  fracture.  Fracture  attempts  to  assign  a  syntactic  category  lahe] 
and  me2ming  expression  fragment  to  a  list  of  vords.  Tiie  basic  strategy  is  top  down:  nondeterminis- 
tically  split  uords  into  two  phrases,  a  head  and  a  complement:  nondeterministically  assign  part  oi  the 
parent  meeuiing  to  the  head  and  part  to  th«‘  complement  accoialing  to  the  linking  rule:  and  recnrsi\e|y 
call  fracture  on  both  the  head  and  complement.  'Ibis  routine  uses  four  jneces  of  information  global 
to  the  language  acquisition  process:  the  base  categories  that  project  into  the  X  system,  a  flag  indi¬ 
cating  whether  the  language  is  head-initial?  or  final,  another  flag  indicating  whether  the  language  is 
spec-initial?  or  final,  and  the  lexicon,  a  map  from  words  to  their  syntactic  categories  and  meanings. 

(if  (cind  (consp  category)  (eq  (second  category)  ’p)  (eq  meaning  ’J.))  (fail)) 

The  above  statement  implements  the  third  additional  restriction,  namely  that  a  node  labeled  XI’  cannot 
have  ±  as  its  meaning. 

(if  (and  (null  words)  (not  (eq  meaming  ’±)))  (fail)) 

The  above  statement  implements  the  second  additional  restriction,  namely  that  non-overt  nodes  must 
he  assigned  ±  as  their  meaning. 

(cond 

((equal  category  ’(i  spec))  (fracture  words  ’(n  p)  meaning)) 

((equal  category  ’(i  bar))  (fracture  words  ’(v  p)  meaining)) 

There  are  five  cases  in  the  fracture  routine.  The  above  two  ca.se.s  implement  |)rinciples  (i  and  7  of  the 
variant  of  X  theory  presented  on  page  •'io  (that  Ispe<'  processed  as  N'P  and  that  1  is  |>rocessed  as  \'I’). 

((and  (consp  category)  (eq  (second  category)  ’bar)) 

(either 

(fracture  words  (first  category)  meaning) 

The  third  case  handles  phrases  of  type  X.  A  node  of  category  X  can  lu'  eit  her  unary  or  binary  branching. 
A  nondeterministic  choice  is  made  between  the  two  by  the  either  clause.  The  al)ove  stalenumt  handles 
the  case  of  unary  branching. 

(let*  ((split  (split  words)) 

(head  (if  head-initial?  (first  split)  (second  split))) 

(complement  (if  head-initial?  (second  split)  (first  split)))) 

(if  (null  head)  (fail)) 

(if  (null  complement)  (fail)) 

(let  ((complement -meaning  (possible-complement-meaning  meaning))) 

(fracture  complement  ‘ ( , (member-of  categories)  p)  complement-mecining) 

(fracture 

head  category  (possible-head-meaning  complement-mecining  meaning)))))) 

The  above  statement  implements  the  second  alternative  for  phrases  of  type  X.  It  nondeterministically 
splits  the  phrase  into  two  halves,  one  to  become  the  head,  the  other  to  become  the  complement.  The 
choice  of  which  half  becomes  the  head,  and  which  the  complement,  is  determined  by  the  head-initial? 
parameter.  Note  that  the  head  must  not  be  null,  since  the  first  additional  restriction  states  that  nodes 
labeled  X  must  be  overt.  Furthermore,  the  complement  must  not  be  null,  since  complements  are  la¬ 
beled  XP.  nodes  labeled  XP  cannot  have  ±  as  their  meaning,  and  non-overt  nodes  must  mean  J-.  The 
routines  possible-complement-meaning  and  possible-head-meaning  imiilement  the  linking  process 
in  reverse.  Given  a  parent  meaning,  they  nondeterministically  return  all  possible  head  meanings  and 
complement-meanings  that  can  combine  to  form  the  parent  meaning.  They  will  be  described  in  greater 
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detail  later.  Two  recursive  calls  are  made  to  fracture,  one  to  fracture  the  coiBple>ent  as  a  phra.se 
of  the  category  YP.  noiideterministically  for  some  base  category  V,  and  one  to  fracture  the  head  a-s  a 
phrase  of  category  X. 

((and  (consp  category)  (eq  (second  category)  'p)) 

(let*  ((split  (split  words)) 

(head  (if  spec-initial?  (second  split)  (first  split))) 

(complement  (if  spec-initied?  (first  split)  (second  split)))) 

(if  (null  head)  (fail)) 

(let  ((complement -meaning  (possible-complement-meaning  mecuiing))) 

(fracture  complement  ‘(.(first  category)  spec)  complement -meaning) 

(fracture 

head 

‘(.(first  category)  bar) 

(possible-head-meaning  complement-meaning  meaning))))) 

The  fourth  case  handles  phrases  of  type  XP.  Like  before,  it  noiideterministically  splits  the  phra.se  into 
two  halves,  one  to  become  the  head,  the  other  to  become  the  complement  (in  this  ra.se  actually  the 
specifier).  The  choice  of  which  half  becomes  the  head,  and  which  the  complement,  is  determined  by  the 
spec-initial?  parameter.  Again,  note  that  the  head  must  not  be  null,  since  the  first  additional  restric¬ 
tion  states  that  nodes  labeled  X  must  be  overt.  Like  before,  the  parent  meaning  is  nondeterministically 
divided  into  a  head  meaning  and  an  complement-meaning.  Two  recursive  calls  are  made  to  fracture, 
one  to  fracture  the  complement  as  a  phrase  of  category  Xspec  and  one  to  fracture  the  head  as  a  phrase 
of  category  X. 

((or  (and  (consp  category)  (eq  (second  category)  ’spec))  (symbolp  category)) 
(unless  (null  words) 

(unless  (null  (rest  words))  (fail)) 

(let*  ((new-definition  (list  category  (canonicalize-meaning  meaning))) 
(old-definition  (gethash  (first  words)  lexicon))) 

(if  old-definition 

(unless  (equal  new-definition  old-definition)  (fail)) 

(locally-setf  (gethash  (first  words)  lexicon)  new-definition))))))) 

The  final  case  handles  terminals.  According  to  principle  3  of  the  variant  of  X  theory  presented  on 
page  55,  categories  Xspec  and  base  categories  X  are  terminal.  A  lexical  entry  comprising  a  syntactic 
category  and  meaning  is  created.  If  this  word  already  has  a  different  lexical  entry  then  enforce  the 
monosemy  constraint  by  failing.  If  a  terminal  is  non-overt  no  lexical  entry  is  added  to  the  lexicon. 

(defun  subexpression  (expression) 

(if  (consp  expression) 

(either  expression  (subexpression  (member-of  (rest  expression)))) 
expression)) 

(defun  possible-complement-meaning  (parent -meaning) 

(either  ’.L 

(let  ((complement -meaning  (subexpression  parent-meaming))) 

(unless  (variable-free?  complement -meaning)  (fail)) 

(if  (equal  complement-meaning  parent -meaning)  (fail)) 
complement-meaning) ) ) 
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Tlie  function  subexpression  noiulf^tenninisticaily  returns  some  suhexpressioii  of  an  expression  Hie 
function  possible-complement-mezuiing  iinplemeiits  half  of  the  inverse  linking  rule.  It  returns  possihle 
complement -meanings  that  can  link  with  an  ajipropriate  head  meaning  to  yield  the  parent-meaming. 
Such  a  complement-meaning  can  he  either  ±.  or  some  suhexpressioii  of  the  parent -meaning.  Hememher 
that  the  linking  rule  carries  two  stipulations.  First,  meanings  of  complements  must  he  variahle-free.  Thus 
complement-meauiings  containing  variables  are  filtered  out.  .Second,  a  head  cannot  have  a  nn'aning  which 
is  just  a  variable.  If  the  complement-meaning  were  to  he  the  same  as  the  parent-meaning,  then  the 
head  meaning  would  have  to  be  just  a  variable.  Thus,  complement -mecinings  which  are  the  same  as  the 
parent -meaning  are  filtered  out. 

(delun  variable-substitute  (subexpression  expression  V2u:iable) 

(cond  ((equal  expression  subexpression)  (either  Vciriable  expression)) 

((consp  expression) 

(cons  (variable-substitute  subexpression  (car  expression)  v^lriable) 

(variable-substitute  subexpression  (cdr  expression)  variable))) 

(t  expression))) 

(defun  possible-head -meaning  (complement-meaming  parent-mecuiing) 

(if  (eq  complement-meaning  ’J.) 
parent-meaning 
(let  ((head-meaning 

( var iable-subst itut  e 
complement-meaning 
parent -meaning 

(make-variable  (1+  (highest -variable  parent-meaning)))))) 

(if  (equal  head-meaning  parent-meaning)  (fail)) 
head-mecuiing) ) ) 

The  function  variable-substitute  takes  a  meaning  expression  and  returns  a  similar  expression 
where  subexpressions  of  that  expression  which  are  equal  to  subexpression  are  nondelerministically 
either  replaced,  or  not  replaced,  by  a  variable.  The  function  possible-head-meaning  implements 
the  other  half  of  the  inverse  linking  rule.  It  returns  possible  head-meanings  that  can  link  with  a 
given  complement -meaning  to  yield  the  p2Lrent -meaning.  If  the  complement -meaning  is  T  then  the 
head-meaning  is  the  same  as  the  parent -meaning.  Otherwise,  we  nondeterministically  substitute  a 
new  variable  for  occurrences  of  the  complement -meeming  within  the  p2urent-me2ming.  Note  that  since 
the  linking  rule  requires  that  the  complement  meaning  be  substituted  for  .sornt  variable  in  the  head 
meaning,  when  doing  the  nondeterministic  inverse  substitution  of  a  variable  for  occurrences  of  the 
complement-meaoiing  in  the  parent-mecining,  we  must  guarantee  that  at  least  one  such  substitution 
has  occurred.  We  must  filter  out  a  head-meaning  that  is  equal  to  the  parent-meaning  since  a  substi¬ 
tution  has  not  occurred. 

Davra  was  presented  with  the  same  corpus  that  was  given  to  Maimra.  This  corpus  is  illustrated 
in  figure  4.2.  This  corpus  consists  of  nine  multi-word  utterances  ranging  in  length  from  two  to  .seven 
words.  Each  utterance  is  paired  with  between  three  and  six  possible  meaning  expressions.  Given  this 
corpus,  Davra  is  able  to  learn  the  lexicon  and  parameter  settings  given  in  figure  4.5.  Inferring  this 
information  requires  about  an  hour  of  elapsed  time  on  a  Symbolics  XL1200^^  computer.  Note  that 
Davra  determines  that  the  linguistic  theory  allows  the  corpus  to  have  only  one  consistent  analysis 
where  the  language  is  head-initial  and  SPEC-initial.  Furthermore,  the  theory  and  corpus  together  fully 
determine  most  of  the  lexicon.  Davra  finds  unique  mappings  for  all  words  to  their  associated  meaning 
expressions  and  for  all  but  two  words  to  their  associated  syntactic  categories.  P’or  example,  the  linguistic 
theory  generates  the  corpus  only  under  the  assumption  that  cup  is  a  noun  which  means  object,  and  .shd 
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Head  Initial,  SPEC  Initial. 

John: 

[N] 

person, 

Mary: 

[N] 

person^ 

Bill: 

[N] 

persou3 

cup: 

[N] 

object. 

iht: 

[Nspec] 

1 

rolled: 

[V] 

GO(f,  [path  ]) 

ran: 

[V] 

GO(r,y) 

slid: 

[V] 

GO(x,  [path  y.:]) 

faced: 

[V] 

ORIENT(x,TO(y)) 

from: 

[N,V,P] 

FROM(j-) 

to: 

[N,V.P] 

TO(r) 

Figure  4.5:  The  lexicon  and  parameter  settings  inferred  by  Davra  for  the  corpus  from  figure  4._'. 
Note  that  Davra  has  uniquely  determined  that  English  is  head-initial  and  SPEf'-initial.  Further¬ 
more,  Davra  has  converged  to  a  unique  word-to-meaning  mapping  for  each  word  in  the  corpus,  as 
well  as  a  unique  word-to-category  mapping  for  all  but  two  words. 


is  a  verb  which  means  G0( x,  [path  y<  -])•  The  only  language-specific  information  which  Da\'R.\  is  not  able 
to  converge  on  is  the  syntactic  category  of  the  words  from  and  to.  It  is  easy  to  see  that  Davra  can  never 
uniquely  determine  that  prepositions  like  from  and  to  should  be  labeled  with  category  P  since  according 
to  the  linguistic  theory  incorporated  into  Davra,  words  labeled  N  and  V  can  co-occur  anywhere  words 
labeled  with  category  P  can  appear.  This  is  a  shortcoming  of  Davra  that  can  be  addressed  by  the 
addition  of  case  theory  and  c-selection  principles.  Case  theory  includes  a  case  filter  which  states  t  hat 
overt  noun  phrases  must  receive  case,  an  abstract  property  assigned  by  certain  lexical  items  to  certain 
complement  positions.  The  case  filter  would  not  allow  from  to  be  labeled  with  category  N  since  nouns 
do  not  assign  case  to  their  complement  and  thus  the  noun  phrase  John  in  Bill  ran  from  John  would 
not  be  assigned  case.  C-selection  principles  state  that  certain  categories  must  appear  as  complements 
of  other  specific  categories.  For  example,  a  verb  phrase  must  appear  as  the  complement  of  an  inflection. 
This  principle  would  not  allow  from  to  be  labeled  with  category  \  since  from  John  does  not  appear  as 
the  complement  of  an  inflectional  element  in  Btll  ran  from  John.  The  next  section  will  discuss  Kenunia, 
a  system  built  subsequent  to  Davra,  that  incorporates  such  additional  linguistic  constraints. 

As  discussed  previously,  one  of  the  main  objectives  for  Davra  was  to  construct  a  single  linguistic 
theory  that  could  acquire  lexica  and  parameter  settings  for  different  languages.  To  test  the  cross- 
linguistic  applicability  of  Davra,  the  corpus  in  figure  4.2  was  translated  from  English  to  Japanese, 
retaining  the  same  non-linguistic  annotation.*®  The  resulting  linguistic  component  of  the  Japanese 
corpus  is  illustrated  in  figure  4.6.  Note  that  the  syntax  of  Japanese  differs  from  English  in  a  number  of 
key  ways.  First,  Japanese  is  a  head-final  language;  prepositions  follow  their  complements  (and  are  thus 
really  postpositions)  and  the  underlying  word  order  is  subject-object- verb.  Second.  Japanese  subjects 
are  generally  marked  with  the  word  ga.  Third,  the  Japanese  word  tachimukau  takes  a  prepositional 
phrase  complement  (i.e.  Eriko  ni)  while  the  corresponding  English  word  faced  takes  a  direct  object 
(i.e.  faced  Mary). 

When  presented  with  this  Japanese  corpus,  Davra  produced  the  lexicon  and  parameter  settings  given 
in  figure  4.7.  Processing  this  corpus  took  about  twelve  hours  of  elapsed  time  on  a  Symbolics  XL1200^'^^ 
computer.  Note  that  Davra  produced  essentially  the  same  result  for  the  Japanese  corpus  as  for  the 

would  like  to  thank  Linda  Hershenson,  Michael  Caine,  and  Yasuo  Kagawa,  who  graciously  performed  this  translation 
for  me. 
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Taro  ga  korogasbimashita. 

Eriko  ga  korogashniiashita. 

Yasu  ga  korogashtiuashtla. 

Chawan  ga  korogafihtina.'ihita. 

Yasu  ga  Eriko  ni  hashirimashiia. 

Yasu  ga  Taro  kara  hashirimashiia. 

Yasu  ga  chawan  nt  hashimnashiia. 

Chawan  ga  Taro  kara  Eriko  ni  subemuashiia. 
Taro  ga  Eriko  ni  iachiniukau. 


Figure  4.6:  The  linguistic  component  of  a  sample  Japanese  corpus  presented  to  Davra.  This  corpus 
is  a  translation  of  the  English  corpus  given  in  figure  4.2.  The  non-linguistic  component  of  the 
Japanese  corpus  is  identical  to  that  of  the  English  corpus. 


English  corpus  despite  the  syntactic  differences  between  the  two  languages.  Thus  Davra  detertnined  tliat 
Japanese  was  head-final  but  SPEC-initial,  accounting  for  the  postpositional  and  verb-fitial  properties. 
Davra  was  not  hindered  by  the  presence  of  ga.  and  by  assigning  it  the  meaning  expressioti  ±.  determined 
that  its  meaning  was  outside  the  realm  of  the  Jackendovian  semantic  representation  used.”  Just  as  for 
the  English  corpus,  Davra  determines  unique  word- to-meaning  mappings  for  all  words  in  the  Japanese 
corpus,  as  well  as  unique  word-to-category  mappings  for  all  but  two  words  in  that  corpus.  Davra 
exhibits  the  same  limitations  in  Japanese  as  in  English  and  is  unable  to  narrow  the  possible  syntactic 
categories  a.ssigned  to  prepositions  like  kara  and  nt.  Notice  however,  that  Davra  does  determine  that 
tachimukau  does  not  incorporate  a  path  in  its  meaning  representation  (i.e.  ORlENT(/\j/)).  while  faced 
does  (i.e.  ORIENT(  j,  TO({/))).  accounting  for  the  different  argument  structure  of  these  two  words. 

Thus  Davra  has  been  successful  as  an  initial  attempt  to  demonstrate  cross-linguistic  language  ac¬ 
quisition.  Davra  has  simultaneously  learned  syntactic  parameter  settings,  and  a  lexicon  mapping  words 
to  their  syntactic  categories  and  meanings,  with  no  prior  information  of  that  type,  for  very  small  corpora 
in  two  different  languages. 

As  was  the  case  for  Maimra,  the  language  model  produced  by  Davra  does  not  depend  on  the  order 
of  the  utterances  in  the  corpus  since  Davra  simply  finds  all  language  models  consistent  with  the  entire 
corpus.  Again  however,  the  complexity  of  the  search  task  can  heavily  depend  on  the  order  in  which  the 
utterances  are  presented  to  Davra.  The  search  space  grows  intractably  large  if  the  corpus  is  ordered 
so  that  earlier  utterances  have  many  consistent  language  models  that  are  filtered  out  only  by  latter 
utterances. 

4.2.1  Alternate  Search  Strategy  for  Davra 

As  discussed  previously,  one  of  the  unsatisfying  aspects  of  Davra  is  its  use  of  blind  search  across  the 
entire  corpus  retained  in  memory.  For  just  this  reason,  this  is  not  a  plausible  process  model  for  child 
language  acquisition.  An  initial  experiment  was  undertaken  to  explore  more  plausible  alternative  learning 
strategies  within  the  same  linguistic  theory  used  by  Davra.  A  different  top-level  search  strategy  was 
built  for  Davra  that  retained  the  same  underlying  parsing  mechanism.  This  experiment  was  attempted 
only  for  the  English  corpus.  Furthermore,  for  this  experiment.  Davra  was  given  the  correct  parameter 
settings  cis  input  and  asked  to  learn  only  the  lexicon. 


"Davra  assigns  the  cat^ory  Vspj;(^-  to  ga.  This  is  probabl.v  not  linguistically  accurate  but  nonetheless  is  consistent 
with  the  limited  variant  of  X  theory  incorporated  into  Davra. 
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Head  Final,  SPEC  Initial. 

Taro: 

[N] 

person  j 

Eriko: 

m 

persou-j 

Yasu: 

[N] 

persou3 

c ha  wan: 

[N] 

object  1 

ga: 

[VsPEc] 

1 

korogashimashiia: 

[V] 

C10(J-.  [path  ]) 

hashinmashita: 

[V] 

GOijr.y) 

suberimashila: 

[V] 

CO(j-,  [path  1/,  c]) 

tachtmukau: 

[V] 

ORIENT!  j-.i/) 

kara: 

[N.V.P] 

FROM(z-) 

ni: 

[N.V.P] 

TO(j) 

Figure  4.7:  Davra  inferred  this  lexicon  and  set  of  parameter  settings  when  processing  the  Japanese 
utterances  from  figure  4.6  when  paired  with  the  non-linguist ic  input  from  figure  4.J.  Davra  has 
correctly  determined  that  Japanese  is  a  head-final  language.  Furthermore,  as  in  figure  4.5,  Davra  has 
converged  on  a  single  correct  meaning  for  all  words  in  the  corpus  as  well  as  a  single  correct  category 
label  for  all  but  two  words.  Note  that  Davra  has  determined  that  the  word  gn  has  meaning  outside 
the  realm  of  Jackendovian  conceptual  structures  and  that  tachtmukau  does  not  incorporate  a  path, 
in  contrast  to  faced  which  does. 


The  alternate  search  strategy  employed  is  weaker  than  strong  cross-situational  learning.  In  this 
strategy,  Davra  processes  the  input  utterances  one  by  one.  retaining  only  two  types  of  information 
between  utterances:  the  current  hypothesized  lexicon  and  sets  of  previously  tried  inconsistent  hypotheses. 
Once  Davra  processes  an  utterance,  all  information  about  that  utterance  is  discarded,  save  the  above 
two  types  of  information.  Davra  starts  out  with  the  empty  lexicon.  When  processing  each  input 
utterance,  Davra  searches  for  an  extension  to  that  lexicon  that  allows  the  current  utterance  to  meet 
the  constraints  imposed  by  the  linguistic  theory  and  non-linguistic  input.  The  extension  must  obey  the 
monosemy  constraint,  i.e.  new  words  can  be  2issigned  an  arbitrary  lexical  entry  but  words  encountered 
in  previous  utterances  must  be  interpreted  according  to  the  lexical  entries  already  in  the  lexicon.  There 
may  be  several  different  extensions,  i.e.  several  different  cissignments  of  lexical  entries  to  novel  words, 
which  are  consistent  with  the  current  utterance.  In  this  case,  Davra  arbitrarily  picks  only  one  consistent 
extension.  If  Davra  is  succe.ssful  in  extending  the  lexicon  to  account  for  the  new  utterance,  the  extension 
is  adopted,  the  utterance  discarded,  and  processing  continues  with  the  next  utterance.  (The  extended 
lexicon  might  be  the  same  as  the  previous  lexicon  if  the  input  utterance  does  not  contain  novel  words 
and  can  be  parsed  with  the  existing  lexicon.) 

More  often,  Davra  is  unsuccessful  in  finding  a  consistent  extension,  as  would  happen  if  Davra 
previously  selected  the  wrong  extension,  thus  making  incorrect  hypotheses  about  lexical  entries.  In  this 
case,  Davra  attempts  to  find  a  small  subset  of  the  lexicon  that  is  inconsistent  with  the  current  utterance. 
Such  a  subset  of  the  lexicon  is  termed  a  nogood  because  it  rules  out  any  superset  of  that  subset  as  a 
potential  hypothesized  lexicon.  In  particular,  Davra  finds  a  nogood  N  such  that  no  extension  of  .V 
allows  the  current  utterance  to  be  parsed,  yet  removing  any  single  lexical  entry  from  N  yields  an  N'  which 
can  be  extended  to  parse  the  current  utterance.  A  nogood  that  has  this  property  is  called  a  minimal 
nogood.  Davra  constructs  minimal  nogoods  by  a  simple  linear  process  when  the  current  lexicon  cannot 
be  extended  to  parse  the  input  utterance.  Davra  starts  out  taking  the  entire  current  lexicon  as  the 
initial  nogood.  Lexical  entries  are  removed  from  this  initial  nogood  one  by  one  and  the  resulting  nogood 
tested  to  see  whether  it  can  be  extended  to  parse  the  input  utterance.  If  it  can.  the  lexical  entry  just 
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ilropped  is  put  hack  in  the  iiutial  iiogood.  Otlierwisc.  it  is  discartUnl.  It  i^  t-asy  to  sec  iliat  this  liiu-ar 
process  will  produce  a  iiiiniinal  iiogood  with  the  two  al’oreiiientioiied  properties. 

Two  things  are  then  done  with  the  uogood  just  constructed.  First,  it  is  saved  on  a  list  of  discovered 
nogoods.  Whenever,  DaV'RA  later  extends  the  lexicon,  the  extended  lexicon  is  checke<l  to  see  that  it  is 
not  a  superset  of  any  previously  created  nogood.  Extensions  that  are  supersets  of  some  nogood  are  not 
considered.  In  this  way  DaV’RA  is  guaranteed  not  to  inak<“  the  same  mistake  twice.  Second,  one  lexical 
entry  is  selected  arbitrarily  from  the  current  nogood.  This  lexical  entry  is  removed  from  the  current 
lexicon  and  a  new  attempt  is  made  to  extend  the  resulting  lexicon  to  parse  the  curnuit  in|)ul  utterance. 

The  revised  search  strategy  used  by  Davra  is  similar  in  many  ways  to  Mitchell's  ( 1977)  version  sjiace 
learning  algorithm.  Mitchell's  algorithm  was  originally  formulated  for  the  conce/R  leHrmiin;  prolihi.i.  a 
more  general  task  than  language  acquisition.  In  coiicept  learning,  the  learner  is  presenter!  with  a  stream 
of  instances  from  some  instance  space.  Each  input  instance  is  labeled  as  either  an  positive  or  negative 
instance  of  the  concept  to  be  learned.  A  concept  is  a  total  predicate  f  such  that  ( '(j-)  returns  true  if  j- 
is  an  instance  of  the  concept  and  false  othervvi.se.  Concepts  are  chosen  from  a  finite  .set  (’  calhnl  the 
concept  .space.  The  task  faced  by  the  concept  learner  is  to  select  tho.se  C  G  C  such  that  ('(j')  is  true 
for  each  positive  instance  in  the  traitiing  set  and  false  for  each  negativr'  instance  in  the  traitiitig  .--('t . 
Such  a  coticept  is  said  to  cover  the  trainitig  set.  Though  general  concept  learning  allows  both  posit iv<‘ 
and  negative  instances  to  appear  in  the  input.  I  consider  here  only  the  restricted  probletn  whir  It  utilizes 
positive  input  itistances,  since  only  that  portion  is  relevant  to  the  compari.son  with  thi‘  search  strategy 
used  by  Davra.  Mitchell's  version  space  algorithm  operates  as  follows.  First,  a  concept  C'  is  called  mon 
general  than  a  concept  C  if  for  all  j-  iti  the  instance  space,  f '(  j  )  —  C'{i  ).  Likewise,  a  coticept  (  "  is  called 
more  specific  than  a  concept  C  if  for  all  j-  in  the  instance  space,  (."(r)  —  ('{j')-  As  .Mitchell's  algorithm 
processes  the  itistance  one  by  one.  it  maintains  a  set  .S'  of  concepts  that  satisfies  two  projierties.  First, 
each  coticept  C  £  S  tnust  cover  the  set  of  itistances  processed  so  far.  Second,  for  each  coticept  ('  £  S 
there  catinot  be  a  more  specific  concept  C  £  C  that  also  covers  the  set  of  instatices  processed  so  far. 
These  properties  are  met  by  initializing  .S’  to  contain  the  most  specific  concejits  in  (’  and  tipdating  .S 
after  processing  each  instance  j-  by  replacing  those  C  £  S  for  which  ("(x)  rettirtis  fal.so  with  the  tiiost 
specific  getieralizatiotis  of  C'  where  C'(j')  returns  true. 

During  the  operation  of  Mitchell's  algorithm,  the  target  concept  tnust  always  be  tnore  getieral  thati 
every  elemetit  of  .S'.  Furthertnore.  any  coticept  that  is  strictly  less  general  than  some  ('lenient  of  .s'  can 
be  ruled  out  as  a  potential  target  concept.  The  set  .S'  can  be  seen  as  a  border,  dividing  the  concept 
space  C  into  two  regions,  one  containing  potential  target  concepts,  the  other  containing  those  concepts 
ruled  out  as  potential  target  concepts.  The  ability  for  .S'  to  rule  out  potential  concepts  is  analogous 
to  the  set  of  nogoods  used  by  Davra  s  revised  search  strategy.  The  analogy  can  be  made  explicit  a-' 
follows.  Each  utterance  paired  with  its  non-linguistic  input  is  an  instance  of  the  concept  to  be  learned, 
where  concepts  are  language  models.  A  language  model  returns  true  for  an  instance  if  some  extension 
of  that  model  allows  the  instance  to  be  parsed.  One  language  model  is  more  general  than  another  if 
the  former  is  ;.  subset  of  the  latter.  The  .set  of  nogoods  maintaiiK'd  by  Davra  corres[K)nds  to  .s'  with 
one  minor  variation:  5  contains  the  most  specific  concepts  which  cover  the  input  while  a  nogood  is  a 
most  general  concept  which  does  not  cover  the  input.  The  set  of  nogoods  maintained  by  Davra  thus 
constitutes  one  side  of  the  bordei  of  the  region  bounding  potential  target  concepts  while  .S'  constitutes 
the  other  side.  .Modulo  these  differences,  this  border  can  be  considered  a  frontier,  which  W('  can  take 
to  be  on  the  same  side  of  the  border  as  in  Mitchell's  algorithm.  Mitchell's  algorithm  uses  the  frontier 
to  constrain  the  region  of  potential  target  concepts.  Davra  however,  uses  the  frontier  only  to  rule  out 
potential  target  concepts.  For  reasons  of  efficiency,  Davra  maintains  a  less  tight  frontier  than  does 
Mitchell's  algorithm,  ruling  out  fewer  potential  target  concepts.  This  h'ss  tight  frontier  is  the  result  of 
the  following  differences  between  Davra  and  Mitchell's  algorithm.  First.  Mitchell's  algorithm  initializes 
the  frontier  to  contain  all  of  the  most  specific  concepts  in  t’.  Davra's  initial  frontier  consists  of  a  single 
concept,  the  current  language  model.  Second.  Mitchell's  algorithm  replaces  all  elements  of  the  frontier 
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with  their  generalizations  wlien  those  elements  do  not  cover  an  input  instance  Da\  H.A  replaces  only 
one  element  of  the  frontier,  namely  the  current  language  model,  with  its  generalizations,  when  it  cloes 
not  cover  an  input  instance,  f  inally,  when  an  element  of  llie  frontier  does  not  cover  an  in|)ut  instance. 
.Mitchell's  algorithm  replaci's  it  with  all  of  tin*  iiK*>t  specific  generalization,-,  which  do  cover  the  in|)Ut 
instance.  Davra  replaces  such  an  element  with  only  one  most  specific  generalization  which  covers  the 
input  instance. 

The  aforementioned  strategy  was  applied  to  the  Engli.di  corpus  from  figure  -1.2.  Since  this  strategy 
is  weaker  than  strong  cross-situational  learning.  th<‘  corpus  is  too  short  to  allow  Davra  to  converge  to 
a  correct  lexicon.  In  the  absence  of  a  larger  corpus,  the  existing  corpus  was  repeati'dly  applied  as  injiut 
to  the  alterna!<'  strategy  until  Davra  was  able  to  make  a  complete  |ia.ss  through  the  corpus  without 
needing  to  retract  any  lexical  entries.  Davra  reijuired  two  passes  through  the  corpus  in  figure  -1.2  for 
convergence  and  produced  the  same  lexicon  as  shown  in  figure  -I..")  as  output.  This  strategy  reipiired 
only  a  few  minutes  of  elapsed  time  on  a  Symbolics  XL1200^'’  comi>uter. 

Note  that,  as  formulated  above,  this  strategy  simply  finds  a  single  consistent  lexicon.  It  does  not 
determine  that  the  linguistic  theory  aiul  corpus  imply  a  unique  solution.  One  could  extend  this  techniipie 
to  determine  all  solutions  by  temporarily  ruling  out  each  solution  as  it  was  foiiiul  and  continuing  the 
search  for  further  solutions.  This  is  done  by  considering  each  solution  to  be  a  nogood.  No  further 
solutions  can  be  fouiul  when  the  empty  nogood  is  produced.  While  it  may  be  expensive  to  det<'rmine  all 
solutions,  a  variant  of  this  technique  can  be  used  to  determine  whether  or  not  the  learner  has  converged 
to  a  unique  solution  by  simply  checking  whether  a  single  additional  solution  exists.  Also  note  that 
unlike  the  original  implementation  of  Davra.  the  rate  of  convergence  of  this  revised  search  strategy  is 
dependent  on  t  e  order  in  which  utterances  are  processed.  Future  work  will  attenijit  to  quantify  the 
sensitivity  of  this  search  strategy  to  corpus  ordering. 


4.3  Kenunia 

Like  Maimra,  Davra  also  suffers  from  a  number  of  shortcomings  that  limit  its  viability  as  a  complete 
theory  of  child  language  acquisition.  Accordingly.  I  have  constructed  a  third  system.  IvKNfMA  that 
attempts  to  address  some  of  these  shortcomings. 

4.3.1  Overview  of  Kenunia 

The  following  summarizes  the  limitations  in  Davra  addressed  by  KtNL  NlA. 

•  Davra 's  syntactic  theory  is  specified  by  setting  two  binary-valued  parameters;  head-initial/final 
and  SPEC'-initial/final.  Thus  except  for  lexical  differences,  Davra  can  support  only  four  distinct 
language  types.  In  Keninia.  the  analog  of  the  head-initial/final  and  SPEC-initial/final  parameters 
vary  on  a  category  by  category  basis,  increasing  tiie  possibh'  parametric  diversity  of  languages  to  be 
learned.  Furthermore,  since  KenI'NIA  supports  base  adjunction,  additional  |)arameters  s|)ecify  tin' 
adjunction  order,  again  on  a  category  by  category  basis.  The  KeM'MA  .-.yntactic  theory  is  s|)erified 
by  setting  sixteen  binary-valued  parameters,  supporting  ()5,5d()  distinct  ])o.s.sible  languages  types 
to  be  learned,  independent  of  lexical  variation. 

•  The  syntactic  theory  incorporated  into  Davra  is  little  more  than  X  theory.  Keninia  instead 
incorporates  a  much  more  substantial  linguistic  theory  including  X  theory,  movement.  f?-theory. 
case  theory,  and  the  empty  category  principle  (E('P).  While  the  variant  of  X  theory  incorporated 
into  Davra  supports  only  head-complement  structures  over  the  categories  N.  P,  and  1.  the 
variant  incorporated  into  Kenunia  supports  both  heatl- romp  lenient  struct  tires,  as  well  as  fre<' 
base  adjunction,  over  the  categories  N.  \',  P,  D.  I.  and  (’.  Furthermore,  the  syntactic  theory  used 
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by  Kem'NIA  iiHHirporales  a  iiuiiilier  of  current  linguistic  notions,  siicli  as  \'l’-interiial  subjects,  tin 
l)|’  liy|>otliesis.  anil  \'-to-l  nioveinent, 

•  Da\'RA  supports  only  a  weak  notion  of  empty  catej;ory.  1)a\  RA  allows  a  terminal  node  to  bi-  non- 
overt  so  long  as  it  iloes  not  contributi'  any  s<Mnantic  content  to  llie  resulting  utterance  Kkni ma 
extends  this  capacity  to  jirovide  lior  both  noii-overt  words,  as  well  as  movement  and  its  ensuing 
traces.  Kk.M  NIA  incorporates  the  general  notion  of  an  r’/np/  i  fernnna/.  a  terminal  with  no  overt 
phonological  content.  Ke.M’.MA  supports  two  types  of  empt\  terminals;  traces  which  are  bouiul  by 
an  antecedent  arising  from  movement,  aiul  zeros,  words  or  morphemes  which  are  not  phonologically 
overt  but  nonetheless  contain  the  same  full  range  of  linguistic  iiiformatioii  as  other  overt  elemenis 
Thus  unlike  in  D.WRA.  in  Kemnia  a  language  has  an  inventory  of  zeros,  each  of  winch  has  a  s|n‘citic 
syntactic  category  and  cont ributes  specific  semantic  content  to  utterances  in  which  it  appt*ars.  .A 
severe  problem  facing  any  theory  of  language  aciiuisilion  is  the  need  to  explain  how  children  can 
learn  the  inventory  of  non-overt  elements  and  their  linguistic  features.  Furthermore,  one  must  also 
explain  how  children  learn  in  the  presence  of  movement.  'I'liis  clearly  holds  for  micoiitroviTsial 
forms  of  movement  such  as  Wh-movement .  It  is  exacerbated  by  the  current  trend  in  linguistics  tc> 
pastulate  radical  forms  of  movement  aiul  numerous  non-overt  elements.  \'P-internal  subjects  and 
\’-to-l  movement  are  two  examples  of  such  radical  forms  of  movement,  while  the  Larson/ Pesetsk) 
analysis  of  the  ditransitive  is  an  example  that  requires  the  child  to  learn  non-overt  pre|)ositions 
that  bear  specific  lexical  features.  While  Keninia  cannot  currently  handle  all  such  |)lieiioni<'na. 
the  long-term  objective  is  to  tackle  this  problem  head  on  and  develop  a  theory  that  can  exjilain 
language  learning  in  the  presence  of  movement  and  non-overt  elements. 

•  D.AVRA.  like  Maimra,  represents  word  and  utterance  meanings  using  Jackemlovian  conceptual 
structures.  The  .semantic  theory  used  by  Maimra  and  Davra  relates  the  meaning  of  an  utterance 
to  the  meatiings  of  its  constituent  words  via  a  linking  rule  based  on  substitution.  Part  11  of 
this  thesis  will  discuss  many  of  the  short coinitigs  of  both  the  .lackendovian  representation  and  its 
associated  linking  rule.  Basing  a  theory  of  language  acquisition  on  such  a  (piestionable  semantic 
theory  renders  the  language  acquisition  theory  suspect.  The  ultimate  goal  of  this  research  is  to 
develop  a  cotiiprehensive  theory  of  language  acquisition  using  the  semantic  nqiresentatioii  to  be 
discussed  in  par*  11  of  this  thesis  as  its  basis.  Since  that  representation  is  not  yet  fully  formulated. 
Ke.ni'NIA  adopts  a  temporary  stopgap  measure.  It  u.ses  ^-theory  as  its  semantic  representation. 
The  rationale  behind  this  move  is  simple.  Basing  the  theory  of  language  acipiisition  on  the  weakest 
possible,  least  controversial,  semantic  theory  can  yield  a  more  robust  theory  of  language  acquisition. 
The  fewer  assumptions  one  makes  about  the  semantic  theory,  the  less  likely  the  possibility  that 
the  theory  need  be  retracted  as  a  result  of  falsifying  some  semantic  a.ssumption. 

Maimra  and  Davra  represented  word  and  utterance  meanings  as  conceptual  structure  fragments. 
The  meaning  of  John  might  be  person,,  while  the  meaning  of  walked  might  be  [path  ])■  TIk'  link¬ 

ing  rule  would  combine  these  tw^o  fragments  to  yield  CiOfpersoii, .  [path  ])  the  meaning  of  John  walked. 
KenI’NIA  instead  represents  word  meanings  via  two  cotnponents:  a  referent  and  a  O-^rnl.  Fhe  referent 
of  a  word  is  simply  a  token  denoting  the  object  to  which  that  word  refers.  For  example,  the  referent  of 
the  word  John  might  be  person,  while  the  referent  of  the  word  cup  might  be  object,.  Words  such  as 
the.  walk,  and  slide  which  do  not  refer  to  anything  are  assigned  ±  as  their  referent. 

A  ^-grid  denotes  the  argument  taking  properties  of  a  word.  Conceptually,  a  word  assigns  a  distinct  0- 
role  to  each  of  its  arguments.  The  ^-grid  specifies  which  0-role  is  assigned  to  which  argument.  Formally, 
a  0-grid  consists  of  a  set  of  ^-assignments,  each  0-assignment  being  a  0-role  paired  with  a  coinpleiuent 
ineiex.  an  integer  denoting  the  argument  to  which  that  0-role  is  to  be  assigned.  Words  such  as  the. 
John,  and  cup  which  do  not  take  any  arguments  would  have  an  empty  0-grid.  An  intransitive  verb 
such  as  walk  would  have  {Theme  :  1}  a®  its  0-grid.  Fhis  indicates  that  walk  a.ssigns  one  0-role,  namely 
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Theme,  to  it.s  external  argument.  More  formally,  the  notation  iHEME  ;  1  specifies  that  the  complement 
of  the  har-level  "1"  [irojection  of  the  terminal  node  associated  with  tin'  word  walk  is  a.ssigued  the  0- 
role  Theme.  Likt'wise.  the  ff-grid  for  a  transitive  verb  such  as  slidt  might  he  {1'hemk  :  tJ.  .VtiE.M  :  1} 
indicating  that  the  (?-role  Theme  is  assigned  to  the  internal  argument  while'  .AtiENl  is  a.ssigned  to  the 
external  argument.  An  internal  argimn'iit  is  tin'  coriiplemeni  of  a  har-|e\e|  “O'  projection  whih'  an 
external  argument  is  the  complement  of  a  har-h'vel  "I"  projection.  I'sing  compleiiK'in  indices  to  deiiott' 
argument  [tositions.  instead  of  the  terms  internar  and  externar.  keeps  tf-thecjiy  indepeiich'iit  of  tin- 
decision  as  to  the  nuniher  of  har-levels  used  hy  X  theory. 

The  referent  and  ^-grid  components  of  a  word  are  orthogonal.  A  given  word  may  have  just  a  referent, 
just  a  ^-grid.  both,  or  neither.  Typically  however,  all  words  other  than  nouns  will  have  ±  as  thc'ir  referent . 
and  only  verbs  and  prepositions  will  have  non-empty  ^-grids. 

Kenu.NIa  represents  utterance  meanings  via  a  0-inap  that  is  itself  a  .set  of  0- mappings.  A  f^-mapping 
is  similar  to  a  ^-assignment  excel  t  that  a  referent  replaces  the  complement  index.  Thus  the  meaning 
of  John  walked  would  be  represented  in  Kenunia  as  the  tl-map  {The.ME  :  person,}.  This  ^-map  is 
derived  from  the  0-grid  for  walked  and  the  referent  of  John  by  a  process  called  O-iiiarkiitg.  Intuitively, 
the  ^l-marking  rule  combines  {Theme  :  1}.  the  ^-grid  for  walked,  with  person,,  the  referent  for  John  to 
form  the  ^-map  {Theme  ;  person, }  for  John  walked.  A  more  formal  specification  of  this  process  will  bt' 
given  later.  In  Kenu.MA.  ^-marking  plays  the  role  previously  played  by  the  linking  rule  used  in  .Maimra 
and  Davra.  Thus  in  Kenunia.  the  corpus  consists  of  utterances  paired  with  a  ^-map  instead  of  a  sf't 
of  meaning  expressions.  Furthermore,  the  lexicon  maps  words  to  their  referents  and  t9-grids  instt>ad  of 
meaning  expression  fragments. 

Figure  4.8  illustrates  a  corpus  that  has  been  presented  as  input  to  Keni'NIA.  This  corpus  contains 
the  same  nine  utterances  that  were  presented  to  .Maimra  and  Davra  except  that  ^-mai)s  replace  the 
meaning  expressions  as  the  non-linguistic  input  paired  with  each  input  utterance.  Each  utterance  in  the 
corpus  is  paired  with  a  single  ^-map.  Like  the  corpora  pre.sente(l  to  both  Mai.mra  and  Da\  ra.  this 
corpus  also  exhibits  referential  uncertainty.  The  mechanism  used  by  Ke.m  nia  to  re])resent  referential 
uncertainty  differs  from  that  used  by  Maimra  and  D.avra.  In  Maimra  and  Davra.  each  utterance  was 
paired  with  a  set  of  meaning  expressions,  only  one  of  which  constituted  the  actual  meaning.  The  same 
uncertainty  mechanism  could  have  been  incorporated  into  Kenenia.  This  would  have  entailed  pairing 
each  utterance  with  a  set  of  ^-maps.  only  one  of  which  corresponded  to  the  (?-map  generated  by  ^-theory 
for  the  utterance.  KE.NUNIa  however,  supports  uncertainty  in  pairing  linguistic  with  non-linguistic  input 
by  an  even  more  general  mechanism.  Kenunia  requires  only  that  the  actual  ^-map  produced  by  applying 
^-theory  to  the  input  utterance  be  a  subset  of  the  0-map  given  as  the  non-linguistic  in])ut  paired  with 
that  utterance.  The  referential  uncertainty  implied  by  a  set  of  distinct  0-maps  can  be  emulated  by  this 
more  general  mechanism  by  simply  forming  a  single  0-map  that  is  the  union  of  the  individual  distinct 
0-maps. 

4.3.2  Linguistic  Theory  Incorporated  in  Kenunia 

The  linguistic  theory  incorporated  into  Kenunia  can  be  specified  more  precisely  via  the  following  prin¬ 
ciples. 

1.  X  theory 

tree  structure:  The  linguistic  input  to  Kenunia  consists  of  a  sequence  of  utterances,  each  utter¬ 
ance  being  a  string  of  words.  Kenunia  associates  a  set  of  nocle.s  with  each  utterance.  Nodes 
are  organized  in  a  parent-child  relationship.  Each  node  except  for  one  has  a  distinguished 
node  called  its  parent.  The  one  node  without  a  parent  is  called  the  root.  Each  node  also  has 
a  (possibly  empty)  ordered  set  of  nodes  called  its  children.  A  node  with  no  children  is  called 
terminal.  Every  node  is  associated  with  a  (possibly  empty)  substring  of  the  input  utterance. 
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{Agent  :  person, . Theme  :  person,} 

.John  roUtd. 

{Agent  :  person-,.  Theme  :  person,} 

.Marin  roikd. 

{Agent  :  personj.  Theme  ;  person3} 

Bill  i-olUd. 

{Theme  :  object,} 

TIk  cup  rolled. 

{Agent  ;  persong.  Theme  :  persona,  Goal  :  person.,} 
Bill  ran  to  Mary. 

{Agent  ;  persoiig.  Theme  :  persoiig,  Soi  rce  :  person, } 
Bill  ran  from  .John. 

{Agent  :  persona.  Theme  ;  persona.  Goal  :  object, } 
Bill  ran  to  the  cup. 

{Theme  ;  object, ,  Source  ;  person,.  Goal  :  person.,} 
The  cup  .slid  from  .John  to  Mary. 

{Agent  :  person,.  Patient  :  person,, Goal  :  person^} 
John  faced  Mary. 


Figure  4.8:  A  sample  corpus  presented  to  Kenunia 


Nodes  associated  with  empty  substrings  are  called  empty.^-  The  substrings  a.ssocialed  witli 
nodes  obey  the  following  two  constraints.  First,  the  substring  associated  with  a  non-terminal 
node  must  equal  the  concatenation  of  the  substrings  of  its  children,  taken  in  order.  Second, 
the  substring  associated  with  the  root  must  equal  the  input  utterance. 

binary  branching:  Each  node  has  at  most  two  children. 

categories:  Each  node  is  labeled  with  a  category,  which  is  one  of  the  symbols  N.  \’.  P,  D.  1.  or  (  ’. 
Kenunia  is  written  so  that  the  set  of  possible  categories  is  a  parameter  of  the  linguistic  theory. 
Currently,  the  value  of  this  parameter  is  given  as  input  to  KenuNIA — it  is  not  acquired.  Future 
work  may  explore  the  feasibility  of  acquiring  the  set  of  possible  category  labels,  i.e.  treating 
category  labels  as  integers  and  trying  sets  of  ever  increeising  cardinality  until  one  is  found 
that  is  consistent  with  the  input. 

bar-level:  Each  node  is  labeled  with  a  bar-level,  an  integer  between  0  and  AJ .  A  node  labeled 
with  bar-level  0  is  called  a  minimal  node,  while  a  node  labeled  with  bar-level  M  is  called  a 
maximal  node.  Here  again,  Kenunia  is  written  so  that  AJ  is  a  parameter  of  the  linguistic 
theory.  Currently  however,  the  value  of  A/  is  fixed  at  2  and  not  acquired.  As  for  categories, 
future  work  may  explore  the  feasibility  of  acquiring  the  value  for  A/,  instead  of  taking  it  as  a 
fixed  input  value. 

head- complement  and  adjunction  structures;  Each  node  is  either  a  head-complement  struc¬ 
ture  or  an  adjunction  structure.  In  head-complement  structures,  one  distinguished  child  is 
designated  the  head  while  the  remaining  children  (if  any)  are  the  complements  of  the  head.’^ 

Empty  terminal  nodes  are  typiceJly  called  empty  categories  in  linguistic  parltince.  This  introduces  an  ambiguity  in 
the  temt  category,  sometimes  referring  to  a  label  for  a  node,  for  instance  N.  V,  or  P.  and  sometimes  referring  to  a  node 
bearing  a  particular  label.  In  this  formulation,  1  use  the  distinct  terms  category  and  node  for  these  two  different  uses  and 
thus  what  tU-e  typically  called  empty  categories  eu-e  here  referred  to  as  empty  (terminal)  nodes. 

'^The  terminology  used  here  differs  somewhat  from  ciurenl  linguistic  parlance.  .According  to  my  use  of  the  term  head. 
the  heetd  of  an  node  is  its  X*  child,  while  in  standeu-d  usage  it  is  the  X°  child  of  the  X’  node.  Furthermore.  1  use 
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Tlie  category  of  a  head-coiiipleiiient  structure  must  be  the  same  as  the  category  of  its  liead, 
while  the  bar-level  of  a  head-complement  structure  must  be  oiii'  greater  than  the  bar-level  of 
its  head.  For  an  adjunction  structure,  one  distinguished  child  is  ilesignated  the  iieaii  while  the 
remaining  children  are  the  adjuncts  of  the  head.  An  adjunction  structure  must  have  at  least 
one  ailjunct.  Both  the  category  and  bar-level  of  an  adjunction  structure  must  be  the  same 
as  its  head.  Complements  must  be  maximal  nodes,  while  adjuncts  must  be  either  minimal 
or  maximal  nodes.  This  principle,  combined  with  the  principle  of  binary  branching,  implies 
that  all  non-terminal  nodes  have  one  of  the  following  five  configurations. 


X'"*"*  \'+’ 

X' 

(a)  (b)  (c) 


X' 

(cl) 


(C-) 


phrase  order  parameters;  For  each  category  X,  ami  each  (J  <  /  <  .U.  the  language  sets  the 
binary-valued  parameter  [X'  itiitial/final]  to  either  initial  or  final.  In  languages  which 
set  [X'  initial],  a  head  labeled  X'  must  be  the  first  child  of  a  head-complement  structure,  while 
in  languages  which  set  [X'  final],  it  must  be  the  last  child.  Furthermore,  for  each  category  X 
and  each  0  <  t  <  the  language  sets  the  binary- valued  parameter  [adjoin  X'  left /right] 
to  either  left  or  right.  In  languages  which  set  [adjoin  X'  right],  a  head  labeled  X'  must 
be  the  first  child  of  au  adjuuctiou  structure,  while  in  languages  which  set  [adjoin  X'  left],  it 
must  be  the  last  child. Note  that  head-complement  and  adjunction  order  are  si)ecified  on  a 
per  category  and  per  bar-level  basis. 

C-selection:  Any  language  specifies  a  finite  set  C  of  pairs  of  the  form  (X.'i )  where  X  and  N’  ar<' 
categories.  If  (X,  '\')  €  C  we  .say  that  X  c-selects  N’.  If  X  c-selects  N’  then  two  restrictions  ap)>ly. 
First,  any  node  labeled  X'*  must  have  a  single  complement  labeled  This  restriction  is  calletl 
c-selection.  Secotid.  any  node  labeled  must  be  the  complement  of  a  node  labeled  X*’.  This 
restriction  is  called  inverse  c-selection.  Kenuni.v  is  written  .so  that  the  set  C  of  c-selection 
relations  is  a  parameter  of  the  linguistic  theory,  (.‘urrently.  the  value  of  this  parameter  is  given 
as  input  to  Ke.nunia — it  is  not  acquired.  All  of  the  work  described  in  this  chapter  assumes 
a  specific  set  C  of  c-selections,  namely  that  D  c-selects  N.*'’  1  c-selects  V.  and  ('  c-selects  I. 
Future  w'ork  may  explore  more  basic  principles  which  govern  the  acquisition  of  C. 

terminals:  Terminals  must  be  either  minimal  or  maximal  nodes, 

roots:  Root  nodes  must  be  maximal. 


2.  Move-Q 

Kenunia  does  not  construct  ati  explicit  D-structure  representation  and  thus  does  not  represetit 
movement  as  a  correspondence  between  such  a  representation  and  S-structure.  Instead.  Kent  nia 
operates  in  a  fashion  similar  to  Fong's  (1991)  parser  and  constructs  only  an  S-structure  represen¬ 
tation  that  is  annotated  with  co-indexing  relations  between  antecedents  and  their  bound  traces. 
Kenunia  associates  a  set  M  of  movement  relations  with  the  set  of  nodes  constructeil  for  each  in¬ 
put  utterance.  Each  movement  relation  is  an  ordered  pair  of  nodes.  If  .Vf  contains  tin'  pair  (o.  .i), 

a  generalization  of  the  term  complement  to  refer  to  the  siblings  of  heads  bearing  any  bar-level.  Standard  usage  applies 
the  term  complement  only  to  siblings  of  X®  heads,  and  instead  applies  the  term  SPEC’  to  siblings  of  heads.  My 
non-standard  use  of  terminology  affords  greater  uniformity  in  slating  the  theory  described  here. 

^‘*Note  that  this  formulation  of  parameter  settings  is  independent  of  the  binary  branching  principle. 

With  six  categories  and  M  =  2  there  are  nominally  .30  binary- valued  parameters.  Additional  principles  and  restrictions 
reduce  this  to  lt>  non-degenerate  parameters. 

*^'This  is  in  accord  with  the  DP  hypothe.sis. 
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we  say  tliat  3  is  the  antecedent  of  «  and  that  «  is  hound  by  ,i.'‘  Movement  relations  are  subject 
to  the  following  constraints. 

(a)  Bound  nodes  must  be  empty  terminals.  Bound  empty  terminals  are  called  traces. 

(b)  Nodes  can  bind  only  one  trace.*** 

(c)  Traces  must  have  only  one  antecedent. 

(d)  Antecedents  must  be  either  mininial  or  maximal  nodes.  This  means  that  only  minimal  and 
maximal  nodes  move. 

(e)  The  liead  of  an  adjunction  structure  cannot  be  a  trace.  This  means  that  a  base  generated 
adjunction  structure  cannot  move  without  its  adjuncts. 

(f)  The  head  of  an  adjunction  structure  cannot  be  an  antecedent.  This  means  that  no  node  can 
adjoin  to  a  moved  node. 

(g)  Nodes  cannot  bind  themselves.  This  is  part  of  what  is  known  as  the  /-within-/  constraint. 

(h)  Antecedents  must  have  the  same  category  and  bar-level  as  their  bound  traces. 

(i)  Antecedents  must  m-command  their  bound  traces.  This  is  a  variant  of  EC'F,  the  empty- 
category  principle. 

(j)  Antecedents  and  their  bound  traces  cannot  be  siblings. 

(k)  Antecedents  must  not  be  ^-marked.  The  concept  of  ^-marking  will  be  defined  below.  This 
means  that  a  node  cannot  move  to  a  (^-marked  (argument)  position. 

3.  ^-theory 

Kenunia  incorporates  the  following  variant  of  ^-theory.  As  discussed  previously,  each  word  has 
an  associated  referent  and  ^-grid.  A  ^-marking  rule  is  used  to  construct  a  ^-map  corres|mnding 
to  an  entire  utterance  from  the  referents  and  0-grids  associated  with  its  constituent  words.  More 
precisely,  a  lexicon  maps  (possibly  empty)  strings  of  words  to  their  associated  referent  and  0-grid. 
Each  terminal  is  associated  with  some  (possibly  empty)  substring  of  the  input  utterance.  Every 
terminal,  except  for  traces,  is  assigned  both  a  referent  and  a  0-grid,  in  addition  to  a  category 
and  bar-level.  This  includes  both  overt  as  well  ais  empty  terminals.  The  referent  and  0-grid  for 
a  terminal  is  taken  from  the  lexical  entry  for  the  (possibly  empty)  substring  of  words  associated 
with  that  terminal. 

Intuitively,  the  0-marking  rule  combines  a  0-assignment  such  as  Theme  :  1,  with  a  referent  such 
as  persoHj  to  form  the  0-mapping  Theme  :  person,.  The  0-map  for  an  utterance  will  contain  a 
number  of  such  0-mappings,  one  for  each  0-assignment  in  the  0-grid  of  each  word  in  the  utterance. 
A  word  or  node  with  a  non-empty  0-grid  is  called  a  9-assigner.  0-theory  stipulates  that  each  0- 
assigner  must  discharge  its  0-grid.  Discharging  a  0-grid  involves  discharging  each  of  its  constituent 
0-assignments.  Discharging  a  0-assignment  (i.e.  assigning  a  0-role)  is  done  by  9-marking  the  ap¬ 
propriate  complement  of  the  0-a.ssigner  involved.  This  involves  pairing  the  referent  of  a  particular 
word  in  that  conplement  with  the  0-role  specified  by  that  0-assignment  and  adding  the  resulting 
0-mapping  to  the  0-map  for  the  utterance.  The  complement  of  the  0-assigner  thus  0-marked  is 
called  a  9-recipient.  0-recipients  are  said  to  receive  the  given  0-role. 

The  0-marking  rule  incorporates  the  following  constraints. 

*‘  I  use  the  terms  antecedent  euid  bound  here  is  a  much  more  restricted  way  than  is  common  in  the  linguistic  literature, 
Kenuniadoes  not  incorporate  any  binding  theory.  The  terms  are  used  solely  to  denote  the  relation  between  a  moved  node 
and  a  trace  created  by  that  movement. 

Kenuniacurrently  supports  only  one  type  of  trace.  Kenuniadoes  not  currently  support  parasitic  gaps.  PRO.  pure 
variables,  or  operator-variable  structiu-es. 
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(a)  ^-marking  is  performed  at  D-structure.  This  standard  assumption  has  two  implications. 
First,  ^t-a,ssigners  which  have  moved  must  discharge  llieir  6t-grids  from  tJieir  position  at  D- 
structure.'''  In  otlier  words,  antecedents  don't  discliarge  tlieir  ^-grids  in  situ.  Instead  tliey 
discharge  their  ^-grids  from  the  location  of  their  bound  traces.  Second,  since  ^-recipients 
receive  their  fl-role  in  their  D-structure  position,  traces  which  are  fl-marked  i)a,s,s  on  that 
0-role  to  their  antecedent, 

(b)  0-assigners  must  discharge  their  0-grids,  In  other  words,  if  a  node  assigns  a  0-role  to  its 
internal  argument,  for  example,  then  there  must  be  an  internal  argument  to  receive  that 
0-role. 

(c)  Complements  of  nodes  labeled  with  non-functional  categories  must  be  0-marked.  This  con¬ 
straint  is  commonly  called  the  O-criterion.  in  Kenum.V,  functional  categories  are  taken  to  be 
those  which  c-select,  namely  D.  1.  and  ('. 

(d)  The  0-map  constructed  for  an  utterance  must  contain  at  least  one  0-mapping. 

The  0-marking  rule  can  be  stated  more  formally  as  follows.  The  ultimate  antecedent  of  a  node  o 
is 

•  a  itself  if  a  is  not  a  trace  or 

•  the  ultimate  antecedent  of  the  antecedent  of  a  if  t»  is  a  trace. 

The  ultimate  referent  of  a  node  a  is 

•  the  ultimate  referent  of  the  antecedent  of  o  if  o  is  a  trace. 

•  the  ultimate  referent  of  the  complement  of  <»  if  ci  is  a  head-complement  structure  and  the 
category  of  a  is  a  c-selecting  category, 

•  the  ultimate  referent  of  the  head  of  o  if  o  is  either  an  adjunction  structure  or  a  head- 
complement  structure  where  the  category  of  o  is  not  a  c-selecting  category,  or 

•  the  referent  of  o  if  a  is  a  terminal  and  not  a  trace. 

Every  non-antecedent  node  o  whose  ultimate  antecedent  is  a  terminal  must  discharge  the  0-grid 
associated  with  that  ultimate  antecedent.  If  the  0-grid  for  the  ultimate  antecedent  of  o  contains 
the  0-assignment  p  :  i  then  find  the  node  d  such  that 

•  d  dominates  a. 

•  the  bar-level  of  d  is  i, 

•  d  is  not  the  head  of  an  adjunction  structure,  and 

•  no  node  which  dominates  o  and  is  dominated  by  d  is  a  complement  or  adjunct, 
and  form  the  0-mapping  p  :  /<  where  p  is  the  ultimate  referent  of  the  complement  of  d. 

4.  Case  theory 

Kenunia  incorporates  a  variant  of  the  case  filter  which  states  that  overt  maximal  D  nodes  can 
only  appear  in  one  of  three  places:  the  complement  of  an  l'  node,  the  complement  of  a  P"  node 
and  the  complement  of  a  node  if  the  V*’  node  assigns  a  0-role  to  its  external  argument.  This 
latter  restriction  is  a  formulation  of  Burzio’s  generalization.  The  above  formulation  of  the  ca.se 
filter  assumes  that  M  =  2. 

’®.4s  stated  previously.  Kenuiliadoes  not  create  an  explicit  D-structure  representation.  Kenunid-s  implementation 
of  ^-marking,  however,  operates  as  if  such  a  representation  existed  by  utilizing  movement  relations  in  the  S-stnicture 
representation  to  guide  the  S-marking  process. 
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').  Monoseiny  constraint 

A  lexicon  maps  word  strings  to  a  unique  category,  bar-level,  referent,  and  ^-grid.  The  category, 
bar-level,  referent,  and  ^-grid  of  terminal  nodes  (exce|)t  for  traces)  must  be  the  those  |)rojected  by 
the  lexicon  for  the  substring  cLSsociated  with  the  terminal  node. 

4.3.3  Search  Strategy 

Kenunia  uses  a  variant  of  the  weaker,  revised  search  strategy  used  by  Davra.  In  this  strategy,  all 
language-specific  knowledge  is  maintained  as  part  of  a  single  language  model.  This  language  model 
contains  information  both  about  the  lexicon  as  well  as  syntactic  parameter  settings.  The  language 
model  consists  of  a  set  of  propositions.  There  are  six  types  of  proposition,  illustrated  by  the  following 
examples. 

1.  category! s/tdf)  =  V 

2.  bar-level(slidf )  =  1) 

3.  referent(.s/idf )  =  J. 

4.  0-gTid(slide)  -  {Theme  :  1} 

0.  [1°  initial] 

6.  [adjoin  1°  left] 

The  first  four  propositions  indicate  components  of  the  lexical  entry  for  the  word  Note  that  the 

category,  bar-level,  referent,  and  ^-grid  for  a  w’ord  are  represented  as  four  independent  propositions  in 
the  language  model.  The  last  two  propositions  indicate  parameter  settings:  in  this  ca.se  the  statement 
that  the  language  is  head-initial  for  inflection  nodes  and  that  adjuncts  adjoin  to  the  left  of  1*'  nodes. 

At  all  times.  Kenunia  maintains  a  single  set  of  such  propositions  that  represent  the  current  cumu¬ 
lative  hypothesis  about  the  language  being  learned.  The  eventual  goal  is  for  the  initial  language  model 
to  consist  of  the  empty  set  of  propositions  and  to  have  Kenunia  acquire  all  six  types  of  propositions 
representing  both  parameter  settings  and  the  syntactic  and  semantic  properties  of  words.  The  current 
implementation,  however,  learns  only  parameter  settings  and  syntactic  categories.  Thus.  Kenunia  is 
provided  with  an  initial  language  model  containing  propositions  detailing  the  referents  and  ff-grids  for 
all  words,  both  overt  and  empty,  that  appear  in  the  corpus.  Kenunia  then  extends  this  language 
model  with  propositions  detailing  the  categories  and  bar-levels  of  those  words,  as  well  as  the  syntactic 
parameter  settings. 

Kenunia  extends  the  language  model  by  processing  the  corpus  on  an  utterance  by  utterance  basis. 
Each  utterance  is  processed  and  then  discarded.  No  information,  except  for  the  cumulative  language 
model,  is  retained  after  processing  an  input  utterance,  other  than  a  set  of  nogoods  to  be  described 
shortly.  When  processing  an  input  utterance,  Kenunia  simply  tries  to  find  a  superset  of  the  current 
language  model  that  allows  the  input  utterance  to  be  parsed.  This  superset  must  be  consistent  in  that 
it  cannot  assign  a  parameter  two  different  settings,  nor  can  it  assign  a  word  two  different  categories, 
bar-levels,  referents,  or  ^-grids.  This  latter  restriction  is  an  embodiment  of  the  monosemy  constraint. 
If  Kenunia  is  successful  in  finding  a  consistent  superset  of  the  language  model  capable  of  parsing  the 
input  utterance,  this  superset  is  adopted  as  the  new  language  model,  and  processing  continues  with  the 
next  utterance  in  the  corpus. 

So  far.  the  strategy  employed  by  Kenunia  is  identical  to  the  revised  strategy  used  by  Davra.  The 
strategies  diverge  however,  when  Kenunia  is  unable  to  find  a  consistent  extension  of  the  language  model. 
In  this  situation,  Davra  would  compute  a  minimal  nogood.  A  nogood  is  a  subset  of  the  current  language 
model  that  is  inconsistent,  i.e.  one  that  cannot  be  extended  to  parse  the  current  input.  A  nogood  is 
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minimal  if  it  has  no  proper  subset  whicli  is  a  nogood  It  turns  out  tlial  tlie  process  used  l)y  Da\  Ra  for 
computing  a  minimal  nogood  is  intract al)le.  Davra  repeatedly  tries  to  remove  indivi<lual  proposii ii)ns 
from  the  nogood,  one  by  one,  testing  tlie  resulting  set  for  consistency.  Although  only  a  linear  number 
of  such  consistency  tests  are  perforinetl,  they  are  performed  on  successively  smaller  sets  of  propositions, 
d'he  smaller  the  language  model,  the  more  freedom  tin'  parser  has  in  making  choices  to  ir>  to  e.xteiul 
that  language  moilel  to  see  if  it  is  consistent  with  the  current  input.  Kxperience  ha.s  shown  that  a 
parser  can  work  efficiently  with  either  an  empty  language  model,  or  one  which  is  almost  fully  sperifitd. 
In  the  former  case,  an  empty  language  model  places  little  restriction  on  finding  a  consistent  e.xtension 
and  thus  one  will  almost  always  be  found.  In  the  latter  case,  a  highly  constrained  language  model  will 
focus  the  search  and  yield  very  few  intermediate  analyses.  A  small  but  non-empty  language  model, 
however,  produces  a  larger  number  of  analy.ses  that  must  be  checked  for  consistency.  For  this  rea.son. 
the  strategy  used  by  D.avra  for  computing  minimal  nogoods  turns  out  to  be  intractable  in  |)ractice. 
Therefore,  Kenu.NIA  uses  nogoods  that  are  not  necessarily  minimal.  When  the  current  languag<’  model 
cannot  be  extended  to  parse  the  input  utterance.  Ke.M'.ma  forms  a  nogood  that  contains  the  following 
propositions. 

•  all  of  the  syntactic  parameters 

•  all  category  and  bar-level  propositions  for  words  appearing  in  the  current  input  utterance 

•  all  category  and  bar-level  propositions  for  zeros  in  the  current  language  model 

This  nogood.  while  not  minimal,  is  nonetheless  a  subset  of  the  current  language  model  and  is  easy  to 
compute. 

Kenunia  uses  nogoods  thus  constructed  in  two  ways.  First .  the  nogood  is  saved  to  prevent  r*'peat('dly 
hypothesizing  the  same  language  model.  Whenever  Ke.M'NIA  attempts  to  extend  a  language  model,  tin* 
extension  is  checked  to  ensure  that  it  is  not  a  superset  of  some  previously  constructed  nogood.  Extensions 
that  are  supersets  of  some  nogood  are  discarded  since  they  are  incou.si.steni  with  |)rior  input.  .Note  that 
Kenunia  does  not  retain  the  prior  input  itself  to  perform  this  check  of  consistency.  Oiily  tin-  nogood. 
the  inconsistent  language  model,  is  retained  to  prevent  looping.  S<'rond.  one  proposition  is  selected 
arbitrarily  from  the  newly  constructed  nogood.  This  proposition  is  removed  from  the  current  language 
model  and  a  new  attempt  is  made  to  extend  the  resulting  language  model  to  par.se  the  current  input 
utterance. 

4.3.4  The  Parser 

A  key  step  of  the  above  learning  strategy  is  determining  whether  the  current  language  model  is  consistent 
with  the  next  input  utterance.  This  requires  determining  whether  the  language  model,  either  as  it  stands, 
or  possibly  extended,  can  parse  the  input  utterance.  Kenunia  uses  a  parser  whose  architecture  is  similar 
to  that  described  by  Fong.  The  parser  consists  of  a  cascade  of  modules.  The  first  module  generates 
potential  S-structure  representations  corresponding  to  the  input  utterance.  Each  subsequent  modide 
can  either  filter  out  structures  which  violate  some  principle,  or  can  adorn  a  structure  with  additional 
information  such  as  ^-markings  or  movement  relations.  Since  such  augmentation  of  structure  can  be 
nondeterministic,  the  number  of  structures  pas.sed  from  module  to  module  can  both  grow  as  a  result 
of  structure  augmentation,  and  shrink  as  a  result  of  filtering.  The  particular  cascade  of  modules  used 
in  Kenunia  is  illustrated  in  figure  4.9.  Note  that  X  theory  must  come  first  since  it  is  the  initial 
generator,  ^-theory  must  come  after  Move-n  since  ^-marking  is  performed  at  D-structure.  ^-theory  uses 
the  movement  relations  produced  by  Move-o  to  reconstruct  the  D-structure  representation.  The  case 
filter  depends  on  Burzio's  generalization  which  requires  determining  the  ^-grid  of  a  head.  Since  a  head 
trace  inherits  its  ^-grid  from  its  antecedent,  the  case  filter  must  com<'  after  Move-o  as  well.  Since  th<' 
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X  theory  *) - Move-o  ) - ( 'a^e  filter  3 - theory  )- 


valid  Structures. 


Figure  4.9;  The  cascade  of  modules  used  in  Kenuni^  parser. 


case  filter  only  rejects  structures  and  doesn't  nondeterminist ically  adorn  them,  it  is  more  efficient  to 
place  it  before  ^-theory.  Thus  the  cascade  order  for  the  parsing  modules  is  fixed. 

The  variant  of  X  theory  incorporated  into  Keni'NIA  generates  infinitely  many  potential  X  structures 
corresponding  to  any  given  input  string.  This  is  because  such  X  structures  can  repeatedly  cascade  empty 
terminals.  Kenunia  might  therefore  never  terminate  trying  to  parse  an  utterance  which  could  not  be 
parsed  with  a  given  language  model.  Solving  this  problem  in  general  requires  induction.  Lacking  the 
ability  to  inductively  prove  that  no  element  of  such  an  infinite  .set  of  S-structures  meets  the  subsequent 
constraints,  or  some  meta-level  knowledge  which  would  bound  the  size  of  the  S-structure  representation 
by  a  known  function  of  the  length  of  the  underlying  utterance,  Kenunia  instead  sets  a  limit  k-  on  the 
number  of  empty  terminals  that  can  be  included  in  a  generated  S-structure.  This  single  limit  k-  applies 
collectively  to  both  traces  and  zeros.  The  implementation  allows  the  limit  k-  to  be  adjusted.  Preliminary 
experimentation  with  different  values  for  k-  indicate  that  performance  degrades  severely  when  k-  >  3. 
All  results  reported  in  this  chapter,  therefore  assume  that  k  =  3.  Kenunia  uses  an  iterative  deepening 
strategy  when  searching  for  S-structures  which  meet  the  constraints,  first  enumerating  those  structures 
which  do  not  contain  any  empty  terminals,  then  those  which  contain  one  empty  terminal,  and  so  forth, 
terminating  after  enumerating  structures  which  contain  k  empty  terminals.  Thus  while  several  alternate 
analyses  for  an  utterance  may  meet  the  constraints  imposed  by  the  linguistic  theory.  Kenunia  always 
adopts  the  analysis  with  the  minimal  number  of  empty  terminals.  It  is  this  analysis  which  contributes 
the  necessary  extensions  to  the  language  model  in  the  search  process  described  previously.  There  may 
however,  be  several  alternate  minimal  analyses.  In  this  case,  an  arbitrary  one  is  chosen  to  extend  the 
language  model. 

The  X  theory  module  operates  essentially  as  a  context-free  parser.  Kenunia  generates  a  context-free 
grammar  corresponding  to  an  instantiation  of  the  aforementioned  variant  of  X  theory  with  the  parameter 
settings  in  the  current  language  model.  For  example,  the  grammar  would  contain  the  rule 

D'  —  D"  N'” 

if  the  language  model  contained  the  parameter  [D”  initial].-*^  Alternatively,  it  would  contain  the  rule 

D'  —  D” 

if  the  language  model  contained  the  parameter  [D*^  final].  Given  such  a  context-free  grammar,  the 
X  theory  module  uses  a  variant  of  the  CK’\'  algorithm  to  generate  S-structures.  The  particular  memo- 
ization  strategy  used  allows  each  variant  structure  to  be  retrieved  in  constant  time  once  the  well-formed 
substring  table  has  been  constructed  in  O(n^)  time. 

One  feature  which  distinguishes  this  parser  from  the  parser  described  by  Fong  is  that  it  can  operate 
with  an  incomplete  language  model.  The  learning  algorithm  in  which  it  is  embedded  must  determine 
whether  a  given  language  model  can  be  extended  to  parse  a  given  utterance,  and  if  so.  what  the  necessary 
extension  is.  If,  for  example,  the  language  model  does  not  set  either  the  [D°  initial]  or  the  [D*'  final] 
parameter,  then  the  grammar  can  simply  contain  both  of  the  above  rules.  Since  however,  any  given 
language  must  set  the  parameter  one  way  or  the  other,  a  hypothetical  analysis  for  an  utterance  could 
never  be  correct  if  one  subphrase  was  generated  by  one  setting,  and  another  by  the  opposite  setting. 


Kenuniadoesn’t  actually  generate  a  context-free  grammar:  rather  the  parser  directly  uses  (he  parameter  settings. 
The  operation  of  the  parser  is  most  easily  explzuned,  however,  as  if  it  utilized  an  intermediate  grammar. 
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This  requires  that  the  X  theory  module  clieck  the  output  of  the  CKV  parser  to  guarantee  that  eacli 
structure  produced  is  generated  by  a  consistent  set  of  parameter  settings.  Tlie  necessary  extensions 
to  the  language  model  can  be  recovered  by  examining  a  structure  output  by  the  final  module  in  the 
cascade.  The  language  model  might  also  contain  incomplete  word-to-category  and  word-to-bar-level 
mappings.  These  are  handled  by  treating  such  words  as  lexically  ambiguous  in  the  CK^  algorithm. 
Here  again,  since  Kenlinia  must  ultimately  enforce  the  monosemy  constraint,  a  hypothetical  analysis 
for  an  utterance  could  never  be  correct  if  some  word  appeared  more  than  once  in  that  utterance  with 
different  category  or  bar-level  assignments.  The  X  theory  module  must  check  the  structures  produced 
for  such  inconsistency  s  as  well. 

The  cascaded  parser  architecture  has  the  property  that  the  X  theory  module  produces  numerous 
structures  that  are  ultimately  filtered  out  by  later  modules  in  the  cascade.  Since  asymptotically,  the 
processing  time  is  proportional  to  the  number  of  intermediate  structures  generated,  it  is  useful  to  fold 
as  much  of  the  constraint  imposed  by  the  later  modules  into  the  C.’K^’-based  structure  generator.  There 
is  a  limit  to  how  much  can  be  done  along  these  lines  however.  Much  of  the  constraint  offered  by  the 
latter  modules  depends  on  non-local  structural  information.  By  its  very  nature,  a  context-free  parser 
can  enforce  only  local  constraints  on  the  structures  it  generates.  There  are  however,  two  components 
of  ^-theory  which  are  essetitially  local  and  thus  can  be  folded  into  the  context-free  structure  generator. 
These  are  the  ^-criterion  and  the  retiuirement  that  all  nodes  discharge  their  ^-grid.  Coupled  with  the 
c-selection  requirements,  these  two  components  can  be  reformulated  as  the  following  pair  of  constraints. 
A  node  X’  must  have  a  complement  if  both  /  =  0  and  X  c-selects.  or  if  the  ^-grid  of  the  ultimate 
head  of  the  node  contains  a  (^-assignment  with  complement  index  i.  Likewise,  a  node  X‘  must  not 
have  a  complement  if  the  ^-grid  of  the  ultimate  head  of  the  node  does  not  contain  a  ^-assignment  with 
complement  index  i  and  X  does  not  c-select.  These  constraints  can  be  encoded  by  adding  features  ±0, 
to  the  categories  X'  in  the  context-free  grammar.  For  example,  the  grammar  would  contain  the  rules 

V'[-b(?o]  -  V'’[-l-0o]  D-'^ 

\’^[~6o]  -  V‘'[-0o] 

but  not  the  rules 

V‘[-0o]- V"[-f?o]  D-" 

V  *[+(^o]  —  V"[+^o]- 


Ground  context-free  rules  can  be  generated  by  enumerating  instances  of  such  rule  schemas,  for  all  possible 
unspecified  feature  assignments,  subject  to  the  constraint  that  the  feature  assignments  of  a  node  must 
match  those  of  its  head. 

So  far,  only  the  above  constraints  have  been  folded  into  the  context-free  CKY-based  structure  gener¬ 
ator.  There  would  be  substantial  algorithmic  benefit  if  all  of  the  remaining  modules  could  be  folded  in 
as  well.  If  this  could  be  accomplished  then  there  would  never  be  any  need  to  enumerate  the  structures 
generated  by  the  context-free  grammar,  since  the  parser  as  a  whole  is  used  only  as  a  recognizer,  to 
determine  whether  an  utterance  is  consistent  with  a  given  language  model.  Such  recognition  could  be 
performed  in  polynomial  time,  irrespective  of  whether  the  language  model  was  complete  or  incomplete, 
notwithstanding  the  need  for  consistency  checks  on  the  generated  structures  as  discussed  previously.  This 
would  allow  efficient  computation  of  minimal  nogoods  since  with  a  CKY-based  recognizer  there  would 
be  no  performance  penalty  for  smaller  language  models  over  larger  ones.  Even  if  some  per-structure 
filtering  was  required,  as  is  the  case  for  con.sistency  checks,  folding  more  into  the  generator,  enabling  it 
to  producing  fewer  structures  which  violate  subsequent  filters,  makes  the  process  of  computing  smaller 
nogoods  more  feasible. 
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4.3.5  Additional  Restrictions 

Even  after  folding  parts  of  ^-tlieory  into  tlie  context-free  S-structure  generator,  the  resulting  generator 
can  still  produce  a  large  number  of  intermediate  structures.  A  manageable  number  of  structures  is 
produced  when  the  language  model  is  complete.  In  such  cases,  the  linguistic  theory  overgenerates  only 
slightly,  with  subsequent  modules  filtering  out  practically  all  of  the  structures  generated.  Smaller  lan¬ 
guage  models  however,  generate  an  astronomical  number  of  intermediate  struct  ures.  While  the  linguistic 
theory  may,  in  principle,  be  able  to  filter  out  all  such  intermediate  structures,  it  has  never  succeeded  in 
doing  .so  in  practice.  Thus  for  pragmatic  reasons,  some  additional  restrictions  are  adopted  that  further 
constrain  the  structures  generated.  Both  X  theory  and  Move-«  are  restricted.  Most  of  the  restrictions 
on  X  theory  apply  to  adjunction.  These  include  the  following. 

1.  The  bar-level  of  the  head  of  an  adjunction  structure  must  be  the  same  as  the  bar-level  of  its 
adjunct.  In  other  words,  a  node  can  adjoin  only  to  a  node  of  the  same  bar-level. 

2.  Minimal  nodes  that  are  the  head  of  an  adjunction  structure  must  bear  the  category  label  1.  In 
other  words,  the  only  minimal  node  that  can  be  adjoined  to  is  l". 

3.  Minimal  adjunct  nodes  must  bear  the  category  label  V.  In  other  words,  the  only  minimal  node 
that  can  be  an  adjunct  is  V'”. 

4.  Maximal  nodes  that  are  the  head  of  an  adjunction  structure  must  be  labeled  either  N  or  \’.  In 
other  words,  the  oidy  maximal  nodes  that  can  be  adjoined  to  are  NF’  and  V'P. 

5.  Maximal  adjunct  nodes  must  be  labeled  either  P  or  ('.  In  other  words,  the  only  maximal  nodes 
that  can  be  adjuncts  are  PP  and  CP. 

Two  further  restrictions  apply  to  X  theory  that  do  not  relate  to  adjunction. 

1.  Complements  of  nodes  bearing  bar-level  1  must  bear  the  category  label  D.  In  other  words,  specifiers 
must  be  DPs. 

2.  The  roof  must  bear  the  category  label  C.  In  other  words,  utterances  must  be  (.’Ps  and  not  other 
maximal  nodes  such  as  DPs  or  PPs. 

3.  Terminals  must  be  either  empty  or  singleton  w'ord  .strings.  Kenunia  cannot  currently  handle 
idioms,  or  terminals  that  correspond  to  more  that  one  word. 

All  of  these  restrictions  are  folded  into  the  context-free  grammar  used  by  the  X  structure  generator.  With 
these  restrictions,  the  number  of  intermediate  structures  generated  is  far  more  manageable.  Additionally, 
several  restrictions  are  imposed  on  Move-o. 

1.  Minimal  antecedents  must  bear  the  category  label  \’.  In  other  words,  the  only  minimal  node  that 
can  move  is  V”. 

2.  Maximal  antecedents  must  not  bear  a  c-selected  category  label.  In  other  words,  c-selected  nodes 
such  as  NP,  VP,  and  IP  don't  move. 

Fong's  parser  implicitly  adopts  these  very  same  restrictions,  with  the  exception  that  adjunction  to  IP 
is  allowed.'*  None  of  these  restrictions  seem  very  principled.  Furthermore,  some  of  them  appear  to  be 
downright  wrong.  They  were  chosen  since  they  are  the  tightest  such  restrictions  which  still  allow  the 
corpus  in  figure  4.8  to  be  parsed.  The  need  for  these  restrictions  is  a  severe  weak  link  in  the  current 
theory.  Incorporating  these  restrictions  was  dictated  by  pragmatic  expedience,  the  advantage  of  getting 
the  system  to  work  at  all.  before  getting  it  to  work  cleanly.  Replacing  these  ad  hoc  restrictions  with 
more  principled  ones  remains  a  prime  area  for  future  work. 

These  restrictions  only  hold  for  that  portion  of  Fong's  parser  which  is  comparable  to  KeilUllia  hi  Fong's  parser  these 
restrictions  do  hold  of  LF  movement,  adjectives,  adverbs,  ancl  I-lowering. 
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Figure  4.10:  Kenuniais  given  these  mappings  from  words  and  zeros  to  their  referents  and  6t-grids 
as  prior  language-specific  knowledge  before  processing  the  corpus  from  figure  4.8. 


4.3.6  Kenunia  in  Operation 

Appendix  B  illustrates  Kenunia's  application  of  the  above  strategy  in  processing  the  corpus  from 
figure  4.8.  For  this  run,  Kenunia  wa,s  also  given  an  initial  lexicon  mapping  the  words  in  the  corpus, 
as  well  as  the  inventory  of  zeros,  to  their  referents  and  0-grids.  This  initial  lexicon  is  illustrated  in 
figure  4.10.  The  initial  lexicon  did  not  include  any  category  or  bar-level  information,  nor  wa.s  Kenuma 
given  any  syntactic  parameter  settings. 

Like  the  revised  Davra  strategy,  Kenunia  processes  a  corpus  repeatedly  to  make  up  for  the  lack 
of  a  larger  corpus.  KENUNIA  makes  two  passes  over  the  corpus  from  figure  4.8  before  converging  on  a 
language  model  that  survives  the  third  pass  without  need  for  revision. 

This  process  can  be  summarized  as  follows.--  Starting  with  an  empty  language  model,  Kenunia 
succeeds  in  processing  the  utterance  John  rolled  forming  the  incorrect  though  nonetheless  valid  structure 
illustrated  on  page  235  in  appendix  B.  In  doing  so,  Kenunia  assumes  that  John  is  a  DP,  roll  is  an  l". 
the  -ed  morpheme  is  a  VT,  and  the  zero  lexeme  is  a  C®.  Kenu.NIa  also  assumes  that  the  language  is 
1°  initial,  I*  final,  and  final.  Kenunia  continues  processing  further  input  utterances  through  page  242, 
successfully  extending  the  language  model  for  each  utterance.  Though  many  of  the  assumptions  made 
are  incorrect,  they  are  consistent  with  both  the  linguistic  theory  and  the  portion  of  the  corpus  seen  so 
far.  When  processing  the  utterance  John  faced  Mary,  however.  Kenunia  is  not  able  to  find  a  consistent 
extension  of  the  language  model  capable  of  parsing  this  utterance.  This  is  illustrated  on  page  243.  At 
this  point  no  single  proposition  can  be  retracted  from  the  language  model  to  make  it  consistent  with 
the  current  utterance.  It  is  possible  however,  to  derive  a  consistent  language  model  by  retracting  both 
the  assumption  that  the  category  of  the  -ed  morpheme  is  V,  as  well  as  the  aissumption  that  its  bar-level 
is  2.  After  retracting  these  assumptions.  Kenunia  is  able  to  process  this  utterance  by  a,ssuming  that 
-ed  is  an  I*^.  This  analysis  is  illustrated  on  page  244.  Note  that  in  order  to  make  this  analysis.  Kenunia 
had  to  posit  a  structure  that  included  both  V'-to-I  movement  as  well  as  subject  raising  from  SPEC  of  \' 
to  SPEC  of  I.  Analysis  of  previous  input  did  not  include  such  movement.  There  is  nothing  magical 
about  this  tran.sition.  Kenunia  did  not  discover  the  concept  of  movement.  The  potential  for  movement 

^^Note  that  in  appendix  B.  the  symbol  0  denotes  a  zero,  t  denotes  a  trace.  X  denotes  an  undetermined  category,  and 
71  denotes  an  undetermined  bcu--level. 
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was  latent  all  the  time  in  the  linguistic  theory  with  which  she  was  innately  endowed.  She  siniiily  did 
not  have  to  invoke  that  potential  until  the  current  utterance,  for  simpler  analy.ses  (i.e.  ones  with  fewer 
empty  terminals)  sufficed  to  e.\plain  the  prior  utterances  of  the  corpus. 

After  successfully  processing  the  previous  utterance  with  the  revised  language  model,  Kk.MM.\ 
begins  processing  the  corpus  again,  since  the  corpus  has  betm  e.xhausted.  l\K.\t  Nl.\  now  encounters 
problems  trying  to  process  the  utterance  lohii  rollid  (page  245).  This  time  however,  a  single  retraction 
suffices  to  allow  Ke.NUNIA  to  continue.  She  retracts  the  assumption  that  roll  is  labeled  I.  replacing  it 
with  the  assumption  that  it  is  labeled  A  (page  24(>).  Kem'NIA  is  then  able  to  successfully  parse  a  few 
more  utterances  until  she  encounters  the  utterance  Bill  ran  1o  Mary  on  |)age  25(J.  This  requires  her  to 
retract  the  assumption  that  run  is  labeled  I  and  decide  instead  that  run  is  labeled  \’  ([>age  251).  .^fter 
one  more  retraction,  labeling  shdr  as  \'  instead  of  I  on  pages  254  and  255.  Ke.M'.Ma  is  able  to  make 
one  complete  pa.ss  through  the  corpus  without  further  revision,  and  thus  converges  on  the  le.xicon  and 
parameter  settings  illustrated  in  figure  4.11.  This  language  model  is  consistent  with  both  the  corpus 
and  the  linguistic  theory.  Processing  the  corpus  to  produce  this  language  model  requires  about  an  hour 
of  elapsed  time  on  a  Sun  SPARCstation  2^“'*  computer. 

As  with  the  revised  version  of  Davra,  the  method  described  above  can  determine  only  that  this  is 
one  possible  consistent  language  model,  not  that  it  is  the  only  such  language  model.  These  methods 
can  be  extended  to  determine  whether  the  solution  is  unique  by  using  the  same  techniques  that  were 
described  for  Davra.  Furthermore,  like  the  revised  version  of  Davra.  the  rate  of  convergence  of  the 
search  strategy  used  by  Kenunia  is  dependent  on  the  order  in  which  utterances  are  process*  d.  Future 
work  wall  attempt  to  quantify  the  sensitivity  of  the  search  strategy  to  corpus  ordering. 

From  figure  4.11  one  can  see  that  Keni'NIA  has  arrived  at  the  correct  category  and  bar-level  a.ssign- 
ments  for  all  of  the  words  in  the  corpus  except  cup.  Ke.nunia  a.ssigns  cup  the  correct  category  N.  but 
incorrectly  assigns  it  bar-level  2  instead  of  0.  One  can  easily  see  that  the  linguistic  theory  incori)orated 
into  Kenunia  is  not  able  to  force  a  word  to  be  labeled  X"  instead  of  X‘  without  seeing  that  word  appear 
with  either  a  complement  or  specifier.  Since  cup  has  an  empty  0-grid,  it  cannot  lake  a  complement  or 
specifier,  for  that  would  violate  the  ^-criterion.  Thus  Kenunia  could  never  uniquely  determine  the 
bar-level  of  nouns  like  cup.  This  is  a  shortcoming  of  the  Kenunia  linguistic  theory  for  which  1  do  not 
yet  have  a  viable  solution. 

Kenunia  likewise  makes  a  number  of  incorrect  parameter  setting  decisions.  She  sets  [N'"  final] 
and  [C**  final].  The  former  occurs  becau.se  in  the  current  corpus  verbs  always  raise  to  adjoin  to  1.-'^ 
There  is  thus  no  evidence  in  S-structure  as  to  the  original  position  of  the  verb,  1  do  not  yet  have 
a  viable  solution  to  this  problem.  The  latter  occurs  because  the  corpus  does  not  contain  any  overt 
complementizers.  With  only  zero  complementizers,  it  is  equally  plausible  to  jiostulate  that  the  zero 
complementizer  follows  an  utterance  as  it  is  to  postulate  that  it  precedes  the  utterance.  Encountering 
utterances  with  overt  complementizers  should  remedy  this  problem. 

Kenunia  is  still  very  much  work  in  progress.  Three  areas  need  further  work.  First,  as  mentioned 
before,  a  number  of  ad  hoc  restrictions  were  adopted  as  part  of  the  linguistic  theory  to  reduce  the 
number  of  intermediate  structures  generated.  Kenunia  does  not  work  without  such  restrictions.  A 
goal  of  prime  importance  is  replacing  those  restrictions  with  ones  which  are  more  soundly  motivated,  or 
perhaps  eliminating  them  entirely  by  using  alternative  parsing  algorithms.  Second,  one  of  the  original 
goals  behind  Kenunia  was  to  extend  Davra  to  account  for  learning  in  the  presence  of  movement  and 
empty  categories.  This  goal  has  been  partially  achieved  since  Kenunia  analyzes  the  corpus  in  figure  4.N 
in  terms  of  V'-to-I  movement  and  raising  of  VP-internal  subjects  to  .SPEC’  of  IP.  Nonetheless,  this  success 
is  partially  gratuitous  since  such  movement  is  theory-internal.  An  immediate  goal  is  to  exhibit  learning  in 
the  presence  of  le.ss  controversial  forms  of  movement .  such  as  Wh-  and  NP-movement .  Doing  so  would  be 
a  major  advance  since  no  current  theory  can  explain  how  children  learn  word  meanings  in  th<“  ])resenre 

^'Thc  methods  suggested  in  LightfcKil  (1991)  work  only  when  the  corpus  contains  some  iitSerances  with  verbs  in  their 
original  position. 
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Syntactic  Parameters: 

[V“  final] 

[V‘  final] 

[P'*  initial] 

[D*’  initial] 

[l”  initial] 

[1^  final] 

[C“  final] 

[adjoin  V‘  right] 
[adjoin  left] 

Lexicon: 


cup: 

[N-] 

object,  {} 

•ed: 

[11 

H) 

John. 

[D-'l 

person,]} 

slide: 

[Vi 

J.{Theme  : 

1} 

that: 

[X'>] 

-L{} 

0: 

[Cl 

-L{} 

fact: 

[Vi 

±{  Patient 

:  1, 

from: 

[Pi 

l{SorRCE 

:0} 

Bill: 

m 

person3{} 

the 

[Dl 

M) 

Mary: 

[Dl 

persono]} 

to: 

[Pi 

X{Goal  ;  0} 

run: 

[Vi 

X{ Theme : 

1} 

roll: 

[Vi 

X{ Theme : 

1} 

Figure  4.11:  The  parameter  settings  and  lexicon  derived  by  Kenuniafor  the  corpus  in  figure  4.8. 
Ken u n iaderived  only  the  category  and  bar-level  information  in  the  lexicon.  The  referent  and  ^-grid 
information  was  given  to  Kenuniaas  prior  language-specific  input. 


M 

of  sucli  iiiovfiiit'iit .  \Vli-iiiov<-m<iil ,  ill  jiarl  iciilar.  prevalciii  in  jiart’iilal  iii|)Ul  In  (liiKlifii  l.oiit;ii' 
It^riii  goals  along  tliesc  lines  would  l>e  to  explain  llie  a<i)iiisii ion  of  iiiiiik mns  |)lienoniena  associaled 
with  the  iiileraclioii  between  the  verbal  and  iiilleei lonal  sysiiuiis.  1  In-se  iin  liide  verb->ee(nid/\erb-linal 
|)heiiomeini.  siibjerl-aiix-iiiversioii.  diathesis  alieriiai  ions  (in  partieular  the  passive  aliernal  ion ).  and  the 
uiiergative/iiiiacciisative  distinction,  linallv.  Kkni  .\ia  doi's  not  yet  a(hie\eihe  |e\tl  of  perforiiiaina  ihai 
has  been  deiiionst rated  with  Da\'HA.  Da\  RA  learns  three  ihinns  parameter  setinins,  word-ln-calej;or> 
niap|)iiigs,  and  word-to-ineaning  mappings  with  no  such  prior  information  I'nrt heiinore,  Da\  ra  doe> 
so  for  very  small  cor|>ora  in  both  English  aiul  Japanese.  Ixiamwia  on  the  (jther  hand,  learns  only  two 
things:  parameter  settings  and  word-to-category  iiiappitigs.  IvKNtMA  must  be  given  word-to-meanin>i 
mappings  as  prior  language-specific  input.  Furthermore,  IxKNI  NIa  has  bi'en  demonstrated  only  on  a 
very  small  Fbiglish  cor|)Us.  A  goal  c>f  jirime  importance  is  to  fully  reiilicate  the  capabilities  of  D.w  r.\ 
within  the  more  comprehensive  linguistic  framework  of  Keni'.MA. 
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Chapter  5 

Conclusion 


Part  1  of  this  thesis  has  addressed  tlie  question:  Whal  proa  dun  might  rhildnn  (uiplog  to  It  am  thtir 
naitvt  languagt  wiihout  any  acctfis  to  pnriously  acquind  languagt -sptcifit  knoult dgt I  have*  advo¬ 
cated  cross-situational  learning  as  a  general  framework  for  answering  this  question.  In  chapter  A.  1  hav(> 
demonstrated  how  cross-situational  learning  can  be  more  powerful  than  trigger- based  learning  and  can 
bootstrap  from  an  empty  language  model.  Furthermore,  in  chapter  *1,  1  have  demonstrated  three  im|)le- 
mented  systems  based  on  this  framework  that  are  capable  of  acquiring  very  small  language  fragments. 
Maimra  learns  both  word-to-category  and  word-to-meaning  mappings,  for  a  very  small  fragment  of 
English,  given  prior  access  only  to  gramtiiar.  Davra  learns  both  word-to-category  and  word-to-meaning 
mappings,  as  well  as  the  grammar,  for  very  small  fragments  of  both  English  and  Japanese.  Kkm’NIa 
learns  word-to-category  mappings  along  with  the  grammar,  for  a  very  small  fragment  of  English,  given 
prior  access  only  to  word-to-meaning  mappings.  Each  of  these  .systems  learns  from  a  corpus,  containing 
positive-only  examples,  pairing  linguistic  information  with  a  representation  of  its  non-linguist ic  context. 
In  Maimra  and  Davra,  both  word  and  utterance  meanings  are  represented  as  Jackendovian  conce|)tual 
structures.  In  KenUNIA,  0-theory  replaces  these  conceptual  structures  as  the  framework  for  representing 
semantic  information.  All  three  systems  are  capable  of  learning  despite  referential  uncertainty  in  the 
mapping  of  utterances  to  their  associated  meaning. 


5.1  Related  Work 

A  number  of  other  researchers  have  attempted  to  give  procedural  accounts  of  how  children  might  ac¬ 
quire  language.  These  accoutits  differ  from  the  one  given  here  in  a  number  of  ways.  Some  advance 
trigger-based  learning — utiambiguously  augmenting  one's  language  model  with  information  gleaned  from 
isolated  utterances — rather  than  the  cross-situational  approach  presented  here.  Most  explain  only  part 
of  the  acquisition  process,  for  instance,  acquiring  word-to-meaniiig  mappings  but  not  word-to-category 
mappings  and  grammar,  or  vice  versa,  assuming  that  the  learner  possesses  some  prior  language-specific 
knowledge.  Furthermore,  most  do  not  deal  with  the  |)roblem  of  referential  uncertainty.  I  will  discuss 
some  related  work  in  detail  below.  Other  important  related  work  which  1  will  not  have  the  opportunity 
to  discuss  includes  Granger  (1977),  Anderson  (1981).  Selfridge  (1981),  Berwick  (1979.  1982.  1988),  and 
Suppes  et  al.  (1991). 

5.1.1  Semantic  Bootstrapping 

Grimshaw  (1979.  1981)  and  Pinker  (1984)  have  proposed  an  approach  which  has  been  termed  the  se¬ 
mantic  bootstrapping  hypothesis.  According  to  this  approach,  the  child  is  assumed  to  first  learn  the 
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meanings  of  iiuliviilual  words  l)y  an  uns|>ecilied  prior  process.  I'lnis  at  tlie  onset  of  semantic  hootstrap- 
{)ing.  a  child  can  already  map.  for  instance.  John  to  Julm.  to.SKFi.  and  Mari/  to  Mai'y.  Fnrtliermore. 
tile  semantic  hootstrapiiing  hypothesis  assumes  that  tlie  child's  innate  linguistic  knowledge  contains  a 
universal  tlefault  map|)itig  Ix'tw'eeii  sematitic  concept  cla.s.ses  and  their  syntactic  realization.  Ihis  knowl¬ 
edge  includes,  for  instance,  the  fact  that  'I'Hl.NtJS  are  realized  as  nouns  and  KN’KN  IS  are  realized  as 
verbs.  Such  language-universal  default  mappings  are  termed  raiioiiical  .structure  realizations.  I  .sing  such 
knowledge,  the  child  can  infer  that  John  and  Mart/  are  nouns,  and  sail-  is  a  verb,  from  the  observation 
that  John  and  Mary  are  TH1N(JS.  and  SLK  is  an  KN'KNT.  Furthermore,  uiion  hearing  an  utterance 
such  cis  John  saw  Mary,  a  child  can  infer  that  the  language  she  is  hearing  admits  utterances  of  the  form 
tioun-verb-noiin. 

Pinker  (p.  38)  illustrates  the  above  strategy  via  the  following  examiile.  For  simiilicity.  supiiose  that 
universal  grammar  was  describeil  by  the  following  grammar  schema. 

S  —  {NP.VP} 

.\'P  —  {(DKT)..\} 

VP  _  {NP.V} 

Thi.s  is  a  grammar  schema  in  tie  .sen.se  that  the  order  of  the  constituents  in  the  right  hand  sides  of 
the  rules  is  not  specified-  the  leartier  must  figure  out  the  correct  order  for  the  language  being  learned. 
Furthermore,  suppose  that  the  above  gratiitiiar  schema  was  innate.  Fiion  hearing  the  utterance  TIk  hoy 
ihrf  w  rocks,  the  learner  could  form  the  following  analysis 


NP  V  N  DET 

i-  J  '  ' 

IN  hoy  threw  rocks 


and  in  doing  so  determine  incorrect  constituent  order  paraiin'tc'rs  and  word-to-category  mappings  for 
English.  If  however,  the  learner  knew  that  boy  and  rocks  were  nouns,  threw  was  a  verb,  and  the  was  a 
detertihner,  presumably  by  applying  canonical  structure  realization  rules  to  their  known  meanings,  she 
could  determine  that  only  the  following  structure  is  possible 


DEI  .N  \'  NP 

III  I 

the  boy  threw  N 


allowing  her  to  infer  the  correct  constituent  order  parameters  for  English. 

The  above  e.xample  works  however,  only  with  the  oversimplified  gramtiiar  schema.  If  one  adopts  a 
more  comprehensive  theory  of  universal  grammar,  the  learner  might  not  be  abh-  to  iitiicpiely  determitie 
the  cotisfituent  order  parameter  settings,  even  given  comph'le  word-to-category  mappings  for  every  word 


.5.1  RELATtJ)  WOHK 


in  the  input.  lake  for  instance,  the  linguistic  theory  wliich  was  clescrihed  in  .section  l.d.  I  niler  tliis 
theory,  the  above  utterance  allows  eight  different  analyses,  where  the  three  [)aranieters  [\'"  initial/finall. 
[V'  initial/final],  and  [(’*'  initial/hnal]  each  var''  inde|)endently.  One  such  analysis  is  shown  below. 


I 

(■' 


I  rocks  1 

hoy 


Whether  semantic  bootstrapping  is  a  viable  acquisition  theory  is  a  question  which  must  be  asked  indepen¬ 
dently  for  each  linguistic  theory  proposed.  The  ability  for  semantic  bootstrapping  to  uniquely  constrain 
potential  analyses  and  determine  parameter  settings  decreases  as  the  linguistic  theory  becomes  richer 
and  allows  more  variance  between  languages.  Thus  it  is  unclear  whether  semantic  bootstrapping  will 
explain  acquisition  under  the  correct  linguistic  theory,  whenever  that  is  discovered. 

The  semantic  bootstrapping  hypothesis  makes  two  crucial  assumptions.  First,  word  meanings  are 
acquired  by  an  unspecified  process  prior  to  the  acquisition  of  syntax.  This  implies  that  the  process  used  to 
acquire  word  meanings,  whatever  it  is,  cannot  make  use  of  syntactic  information,  since  such  information 
is  acquired  only  later.  Furthermore,  semantic  bootstrapping  is  not  a  complete  account  of  language 
acquisition,  since  it  does  not  offer  an  explanation  of  how  that  prior  task  is  accomplished.  It  explains 
only  how  language-specific  syntax  is  acquired,  not  how  word-to-meaning  mappings  are  acquired.  Second, 
semantic  bootstrapping  assumes  that  the  learner  u.ses  a  trigger-based  strategy  to  acquire  language- 
specific  information  from  isolated  situations.  Only  those  situations  that  uniquely  determine  language- 
specific  choices  drive  the  language  acquisition  process.  The  above  example  was  a  failed  attempt  at 
show’ing  how  semantic  bootstrapping  made  such  situations  more  predominant,  constraining  otherwise 
ambiguous  situations  to  be  determinate.  Furthermore,  the  assumption  that  word  meanings  are  acquired 
prior  to  syntax  was  motivated  specifically  as  a  method  for  constraining  ambiguous  situations.  This 
thesis  suggests  a  different  approach  whereby  the  learner  can  acquire  partial  knowledge  from  ambiguous 
situations  and  combine  such  partial  knowledge  across  situations  to  infer  unique  solutions  that  could  not 
be  determined  from  individual  situations  alone.  This  cross-situational  approach  thus  also  alleviates  the 
need  to  a.ssume  prior  knowledge,  since  all  such  knowledge  can  be  acquired  simultaneously  by  the  same 
mechanism. 
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5.1.2  Syntactic  Bootstrapping 

[ii  a  series  of  papers  (Cileitman  1990,  Fisher  et  al.  1991).  (I'leilinan  and  her  rolleagues  liave  |iroposed 
an  alternate  learning  strategy  that  has  become  kiiovvii  as  syntactic  /jootsfra/j/nng.  In  contrast  to  .se¬ 
mantic  bootstrapping,  where  knowletlge  of  word  meanings  guido  the  accpiisition  of  synta.x,  syntactic 
bootstrapping  assumes  essentially  the  reverse,  that  knowledge  of  the  syntactic  structures  within  which 
words  appear  guides  the  search  for  possible  meanings.  This  alternate  strategy  is  best  illustrated  by  the 
following  example.  Suppose  a  child  heard  the  utterance  John  ihriw  ilu  ball  to  Mary  in  the  context 
where  she  observed  John  throwing  a  ball  to  Mary.  Furthermore,  suppo.se  that  the  child  already  knew 
that  John.  ball,  and  Mary  were  nouns  meaning  John.  ball,  and  Mary  resjiectively,  that  to  wa.s  a  j)rej>o- 
sition  meaning  T0(x).  and  that  Iht  was  a  determiner  denoting  a  definite  reference  operator.  In  this 
circumstance,  the  child  lacks  only  the  category  and  meaning  of  Ihnw.  Finally,  suppose  that  the  child 
can  form  a  parse  tree  for  the  utterance.  Cileitman  (1990)  and  Fisher  et  al.  (1991)  suggest  that  such  a 
parse  tree  can  be  constructed  using  prosodic  information  available  in  parental  speech  to  children.'  In 
this  situation,  the  child  can  infer  that  throw  must  mean  "throw"  since  that  is  the  only  meaning  consistent 
with  both  the  non-linguistic  observation,  as  well  as  the  utterance,  given  the  partial  information  already 
known  about  the  meaning  and  syntax  of  that  utterance. 

The  key  idea  here  is  that  the  syntactic  information  in  the  utterance  acts  as  a  filter  on  i)otential  word- 
to-meaning  mappings  for  the  unknown  verb  threw.  At  the  time  the  utterance  was  heard,  other  things 
may  have  been  happening  or  true  in  the  world.  John  may  have  been  wearing  a  red  shirt  and  Mary  could 
have  been  walking  home  from  school.  Either  of  these  could  be  the  meaning  of  some  potential  utterance 
in  that  situation.  Thus  a  priori,  a  novel  verb  heard  in  this  context  could  mean  "wear  or  walk".  \’et  the 
learner  could  infer  that  threw  could  not  mean  "wear"  or  "walk"  since  neither  of  these  could  consistently  fit 
into  the  utterance  template  John  x  the  ball  to  Mary,  given  both  the  known  meanings  of  the  remaining 
words  in  the  utterance,  as  well  as  its  structure. 

As  stated  above,  this  strategy  differs  little  from  that  proposed  by  (Iranger  (1977)  where  the  meaning 
of  a  single  novel  word  can  be  determined  from  context.  Gleitman  however,  takes  the  above  strategy  a  step 
further.  She  claims  that  the  structure  of  an  utterance  alone  can  narrow  the  possible  word-to-meaning 
mappings  for  a  verb  in  that  utterance,  even  without  knowledge  of  the  meanings  of  the  remaining  words. 
Suppose  that  a  child  observed  John  pushing  a  cup  off  the  table  causing  it  to  fall.  In  this  situation,  an 
utterance  can  potentially  refer  to  either  the  pushing  event  or  the  falling  event.  She  claims  that  a  child 
hearing  John  pushed  the  cup  would  be  able  to  infer  that  pushed  refers  to  the  pushing  event  and  not  the 
falling  event  since  structurally,  the  utterance  contains  two  noun  phrases,  and  the  argument  structure 
of  PUSH(x,,y).  but  not  FALL(x),  is  compatible  with  that  structure.  Similarly,  a  child  hearing  Thi  cup 
fell  could  determine  that  fell  refers  to  the  falling  event,  and  not  the  ]nishing  event,  since  its  syntactic 
structure  is  compatible  with  FALL(x),  but  not  PUSH(x,.v)  A  child  could  make  such  inferences  even 
without  knowing  the  meaning  of  John  and  cup,  so  long  as  she  could  determine  the  structure  of  the 
utterance,  using  say  prosodic  information,  and  determine  that  John  and  the  cup  were  noun  phrases, 
using  other  syntactic  principles. 

Gleitman  carries  this  argument  even  further.  In  the  above  examples,  structural  information  was 
used  only  as  a  filter,  to  select  the  correct  interpretation  from  several  possible  interpretations  of  a  given 
non-linguistic  observation.  She  suggests  however,  that  a  verb's  subcategorization  frame  gives  substantial 
clues  as  to  its  meaning,  independent  of  non-linguistic  context.  For  example,  the  fact  that  the  verb  erplain 
can  take  a  sentential  complement,  as  in  John  explained  that  he  was  late  for  school,  indicates  that  it  is  a 

*  While  they  suggest  that  prosodic  information  alone  can  be  used  to  construct  the  parse,  they  also  assume  that  the  cliild 
knows  the  syntactic  categor>'  of  the  nomis  and  prepf»sitions  in  the  utterance.  Since  such  category  infonnation  can  clearly 
aid  the  parsing  process.  I  see  no  reason  why  they  adopt  the  stronger  claini  of  parsing  using  only  prosodic  infonnation.  given 
that  they  in  any  case  assume  the  availability  of  further  information.  It  would  seem  more  felicitous  tt»  assume  that  the  child 
can  construct  a  parse  tree  using  whatever  information  she  has  available,  whether  that  be  syntactic  category  information, 
prosodic  information,  or  both. 
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verb  of  cognition-perception.  A  given  verb  may  admit  several  different  .siibcategorization  frames,  eacli 
furtlier  limiting  its  potential  meaning.  For  example,  the  verb  tTpluin  can  also  apjiear  with  a  direct  object 
and  a  destination,  as  in  .John  uplaiixd  iln  fads  to  Mary,  indicating  that  it  is  also  a  \erb  of  transfer 
Taken  together,  these  two  utterances  strongly  limit  the  pcjssible  meaning  for  tipluin. 

As  outlined  above,  syntactic  bootstrapping  actually  comprises  two  distinct  strategies  1  he>  can  be 
summarized  by  the  following  two  hypotheses. 

1.  Children  can  determine  the  meaning  of  an  unknown  verb  in  an  utterance  by  first  ileiermining  the 
structure  of  that  utterance  using  prosodic  information  alone,  and  then  selecting  as  the  correct 
verb  meaning  the  one  that  allows  that  structure  to  have  an  interpretation  consistent  with  noii- 
linguistic  context,  given  prior  knowledge  of  the  categories  and  meanings  of  the  remaining  words  in 
the  utterance, 

2.  Children  can  constrain  the  possible  meanings  of  an  unknown  verb  by  finding  those  meanings  that 
are  compatible  with  each  of  the  different  subcategorization  frames  heard  for  that  verb. 

These  two  hypotheses  may  be  combined  to  yield  a  single  more  comprehensive  strategy.  Both  of  these 
hypotheses,  however,  make  two  crucial  assumptions.  First,  they  assume  the  availability  of  prior  language- 
specific  information  in  the  form  of  the  word-to-meaning  mappings,  or  at  least  word-to-category  mapi>ings, 
for  the  nouns  and  prepositions  that  appear  as  arguments  to  the  unknown  verb.  Second,  though  not 
explicitly  stated  in  their  work,  their  methods  appear  to  rely  on  the  ability  for  i>rosodic  parsing  to 
determine  a  unique  structure  for  each  utterance.  This  thesis  describes  techniques  for  learning  even 
without  making  the  limiting  assumptions  of  unambiguous  parsing  and  prior  language-sj)ecific  knowledg('. 

The  techniques  described  in  this  thesis  could  be  extended  to  take  prosodic  information  as  input 
along  with  word  strings.  This  would  in  essence  form  a  synthesis  of  the  ideas  presented  in  this  thesis  with 
those  advocated  by  Gleitman  and  her  colleagues.  One  must  be  careful  to  include  only  those  prosodic 
distinctions  which  are  demonstrated  to  exist  in  the  input,  and  which  can  be  detected  by  children.  This 
would  include  less  information  than  say,  a  full  syntactic  analysis  of  the  type  performed  by  Keni  ni.a. 
Even  though  such  prosodic  information  might  be  ambiguous  and  partial,  the  strategies  described  in  this 
thesis  could  be  used  to  find  a  language  model  wdiich  could  consistently  map  the  word  strings  to  their 
meanings,  subject  to  the  constraints  implied  by  the  prosodic  information.  Such  prosodic  information 
would  only  ease  the  learning  task  when  compared  with  the  results  presented  in  this  thesis.  If  prosodic 
information  was  only  partially  available,  or  even  totally  absent,  performance  of  this  extended  technique 
would  degrade  gracefully  to  the  performance  of  the  techniques  discussed  in  this  thesis.  In  order  to 
experimentally  verify  this  claim,  one  must  formulate  a  representation  for  prosodic  information,  along 
with  an  appropriate  linguistic  theory  constraining  the  possible  syntactic  analyses  consistent  with  prosodic 
information  specified  in  that  representation.  Such  an  experiment  awaits  future  research. 


5.1.3  Degree  0-f  Learning 

Lightfoot  (1991)  proposes  a  theory  of  how  children  determine  parameter  settings  within  a  framework  of 
universal  grammar.  His  central  claim  is  that  children  use  primarily  unembedded  material  as  evidence 
for  the  parameter  setting  process.  If  this  claim  is  true,  a  child  must  have  access  to  sufficient  structural 
information  about  the  input  utterances  in  order  to  differentiate  embedded  from  unembeddeii  material. 
Deriving  such  structural  information  requires  that  the  learner  determine  constituent  order  prior  toother 
parameter  settings.  Realizing  this,  Lightfoot  suggests  that  children  have  access  to  syntactic  category 
information  before  the  onset  of  parameter  setting  and  utilize  a  strategy  whereby  they  wait  for  input 
utterances  wdiich  are  simple  enough  to  uniquely  determine  the  setting  of  some  parameter. 
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(3)  a.  NP  —  Specifier  N' 

N'  —  (Adj)[N  or  N']  PP 

(7)  a.  XP  —  {Specifier.  X'} 

h,  X' —  {X  or  X',(VP)} 

(8)  a.  the  house 

b.  students  of  linguistics,  belief  that  Susan  left 

Under  (7),  the  linear  order  of  constituents  constitutes  a  ])ar.auieter  that  is  set  on  exposure  to 
some  trigger.  The  English-speaking  child  hears  phrases  like  (8a)  and.  after  some  development . 
analyzes  them  as  consisting  of  two  words,  one  of  a  closed-class  (the)  and  t  he  other  of  an  open 
class  {house)-,  in  light  of  this  and  in  light  of  the  parameter  in  (7a),  the  child  adopts  the  first 
rule  of  (3a).  Likewise,  exposure  to  phrases  like  (8b)  suffices  to  set  the  parameter  in  (7b). 
such  that  the  second  rule  of  (3a)  emerges.  [. . .] 

(.’onsider,  for  a  moment,  the  development  that  must  take  place  before  these  parameters  can 
be  set.  children  acquire  the  sounds  of  their  languages  and  come  to  u.se  men  as  a  word  and  a 
noun  with  the  meaning  roughly  of  the  plural  of ’man'.  This  is  a  nontrivial  process,  and  many 
people  have  explained  how  it  happens.  Having  established  that  men  is  a  noun,  children 
later  acquire  the  constituent  structure  of  men  from  the  city,  if  I  am  right,  by  setting  the 
parameters  in  (7)  and  projecting  to  NP  accordingly  via  N'.  yielding 

[np  Spec  [n-  [n<  [n  men]][pp  from  the  city]]]. 

Lebeaux  (1988)  discus.ses  this  aspect  of  language  acquisition  very  interestingly.  In  setting 
these  particular  parameters,  children  operate  with  partially  formed  representations  that  in¬ 
clude  [n  men],  [p  from],  [sp«<  the],  and  [n  city].  They  are  operating  not  with  "raw  data"  or 
mere  words  but  with  partially  analyzed  structures. 

Men  from  the  city  and  similar  expressions  occur  in  the  child's  environment  with  an  appro¬ 
priate  frequency,  and,  given  a  partially  formed  grammar  whereby  men  and  city  are  classified 
as  nouns,  a  child  can  assign  a  projection  conforming  to  (7). 

[pp.  6-  7] 

Lightfoot's  proposal  is  thus  very  similar  to  Pinker "s  in  this  regard.  It  tacitly  assumes  that  children 
determine  constituent  order  from  isolated  utterances  which  uniquely  determine  that  order.  It  uses  a 
trigger-based  approach  in  contrast  to  the  cross-situational  strategy  advocated  in  this  thesis.  It  is  unclear 
whether  Lightfoot's  central  claims  about  degree  0+  learnability  are  compatible  with  a  cross-situational 
learning  strategy.  Such  investigation  merits  future  work. 

5.1.4  Salveter 

Salveter  (1979,  1982)  describes  a  system  called  Moran,  which  like  Maimra  and  Davra,  learns  word 
meanings  from  correlated  linguistic  and  non-linguistic  input.  Moran  is  presented  with  a  sequence  of 
utterances.  Each  utterance  is  paired  with  a  sequence  of  two  .scenes  described  by  a  conjunction  of  atomic 
formula.  Each  utterance  describes  the  state  change  occurring  between  the  two  scenes  with  which  it 
is  paired.  The  utterances  are  presented  to  MORAN  in  a  preproceosed  ca.se  frame  format,  not  as  word 
strings.  From  each  utterance/scene-description  pair  in  isolation,  Moran  infers  what  Salveter  calls  a 
conceptual  meaning  structure  (CMS)  which  attempts  to  capture  the  e.ssence  of  the  meaning  of  the  verb 
in  that  utterance.  This  CMS  is  a  subset  of  the  two  scenes  that  identifies  the  portion  of  the  scenes  referred 
to  by  the  utterance.  In  this  CMS  the  arguments  of  the  atomic  formula  that  are  linked  to  noun  phra,ses 
are  replaced  by  variables  labeled  with  the  syntactic  positions  those  noun  phra-ses  fill  in  the  utterance. 
The  process  of  inferring  CMSs  is  reminiscent  of  the  fracturing  operation  performed  by  Maimra  and 
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Da\'KA.  vvliereliy  vt'rl)  iiieaiiiiigs  ar<‘  const ruct<‘«l  oy  cxlracling  out  argiiiiK'nts  from  wliolc  uilciaiici 
iiicaniiigs.  Moran  s  variant  of  this  operation  is  iiiucli  simpler  tlian  tin-  analogous  o|)eration  |)erfor]Ui  >l 
In  Maimra  and  Davra  since  the  linguistic  input  come«.  to  Moran  preiiarseij.  Ihis  preproce^sed  input 
implicitly  relies  on  prior  language-specific  knowledge  of  both  the  grammar  and  tln^  syntactic  categories 
of  the  words  in  the  utterance.  Mora.N  dot's  not  model  the  actpiisition  of  grammar  or  syntactii  category 
information,  and  furthermore  does  not  deal  with  any  ambiguity  that  might  arise  from  tin'  jiarsing  ])rocess 
Additionally,  .Moran  does  not  deal  with  referential  unct'rtainty  in  the  corpus,  f'urthermor*'.  the  cor|>us 
presented  to  .Moran  relies  on  a  subtle  implicit  link  between  the  obji'cts  in  the  world  ami  linguistic  tokt-ns 
used  to  refer  to  the.se  objects.  Part  of  the  difliculty  facer!  by  .Maimra  ami  Davra  is  discerning  that  a 
linguistic  token  such  as  John  refers  to  a  conceptual  structure  fragment  such  as  John.  .Moran  is  givni 
that  information  a  priori  due  to  the  lack  of  a  formal  distinction  between  tin'  notion  of  a  linguistic  token 
and  a  conceptual  structure  expression,  (liven  this  information,  the  fracturing  jirocess  becomes  trivial. 
Moran  therefore,  does  not  exhibit  the  cross-situational  behavior  attributed  to  M.M.MRA  and  D.WRA. 
and  in  fart,  learns  every  verb  meaning  from  just  a  single  utterance.  This  seems  very  imiilaiisible  as  a 
model  of  chihl  language  acquisition.  In  contrast  to  .Maimra  and  Da\’RA.  however,  Moran  is  abh'  to 
learn  polysemous  .senses  for  verbs:  one  for  each  utterance  provided  for  a  given  verb.  .Moran  focuses  on 
extracting  out  the  common  substructure  for  poly.semous  meanings  attempting  to  maximize  commonality 
between  different  word  senses  and  build  a  catalog  of  higher-level  conceptual  building  blocks,  a  task  not 
attempted  by  the  techniques  discussed  in  this  thesis. 

5.1.5  Pustejovsky 

Pustejovsky  (1987,  1988)  describes  a  system  called  Tl  LLV.  which  also  oi>erates  in  a  fashion  similar  to 
Mai.mra.  Dax’Ra,  atid  Moran,  learning  word  meanings  from  pairs  of  linguistic  and  non-linguist  ic  input. 
Like  Moran,  Ti'LLY  is  given  parsed  utterances  as  input.  Each  utterance  is  a.ssociated  with  a  predicati' 
calculus  description  of  three  parts  of  a  single  event  described  by  that  utteranc*':  its  beginning,  middle, 
and  end.  From  this  input,  derives  a  thematic  mapping  imlex.  a  data  structure  representing 

the  0-roles  borne  by  each  of  the  argutiients  to  the  niain  predicate,  TULLY  is  thus  similar  to  Kem  nia 
except  that  Tully  derives  the  0-grids  which  Kenunia  currently  must  be  given  as  prior  language-spc'cific 
knowledge.  Like  Moran,  the  task  faced  by  Tully  is  much  simpler  than  that  faced  by  .Maimra.  Davra. 
or  Kenunia,  since  Tully  is  presented  with  unambiguous  parsed  input,  is  given  the  correspondence 
between  nouns  and  their  referents,  and  does  not  have  to  deal  with  referential  uncertainty  since  it  is  giv('n 
the  correspondence  between  a  single  utterance  and  the  semantic  representation  of  the  event  described 
by  that  utterance.  Tully  does  not  learn  language-specific  syntactic  information  or  word-to-category 
mappings.  Furthermore.  TuLLY  implausibly  learns  verb  meanings  from  isolated  utterances  without  any 
cross-situational  processing.  Multiple  utterances  for  the  same  verb  cause  Tully  to  generalize  to  the 
least  common  generalization  of  tlie  individual  utterances.  TuLL’i’  however,  goes  beyond  Kenunia  in 
trying  to  account  for  the  acquisition  of  a  variety  of  markedness  features  for  0-roles  including  [imotion], 
[±abstract].  [±direct].  [±i)artitive],  and  [±animate]. 

5.1.6  Rayner  et  al. 

Rayner  et  al.  (1988)  describe  a  system  that  uses  cross-situational  techniques  to  determine  the  syntactic 
category  of  each  word  in  a  corpus  of  utterances.  They  observe  that  while  in  the  original  formula¬ 
tion,  a  definite  clatise  grammar  (Pereira  and  Warren  1980)  normally  defines  a  two-argument  predicat<' 
pcirser (Utterance, Tree)  with  the  lexicon  represented  directly  in  the  clauses  of  the  grammar,  an  al¬ 
ternate  formulation  would  allow  the  lexicon  to  be  represented  explicitly  as  an  additional  argument  to 
the  parser  relation,  yielding  a  three  argument  predicate  parser  (Utterance  .Tree  .Lexicon) .  This  three 
argutnent  relation  can  be  used  to  learn  syntactic  category  information  by  a  technicpie  summariz<'d  in 
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?-  Lexicon  *  [entry(the,_) , 
entry (cup, _) , 
entry(slid,_) , 
entry (froB,_) , 
entry( john,_) , 
entry(to,_) , 
entry (nary ,_) , 
entry(bill,_)] , 

parser ( [the, cup, slid, fro«, John, to, nary] ,_, Lexicon) , 
parser ( [the , cup , si id , f ro« , nary , t o , bill] , _ , Lexicon) , 
parser ( [the , cup , slid , f ron ,bill , to , John] , _ , Lexicon) . 


Lexicon  =  [entry (the ,det ) , 
entry(cup,n) , 
entry(slid, v) , 
entry(froB,p) , 
entry (John, n) , 
entry(to,p) , 
entry (■ary,n) , 
entry(bill,n)] . 


Figure  5,1:  The  technique  used  by  Rayiier  et  al.  (1988)  to  acquire  syntactic  category  information 
from  a  corpus  of  utterances. 


figure  5.1.  Here,  a  query  is  formed  containing  a  conjunction  of  calls  to  the  parser,  one  for  each  utterance 
in  the  corpus.  All  of  the  calls  share  a  common  Lexicon,  while  in  each  call,  the  Tree  is  left  unbound.  The 
Lexicon  is  initialized  with  an  entry  for  each  word  appearing  in  the  corpus  where  the  .syntactic  category 
of  each  such  initial  entry  is  left  unbound.  The  purpose  of  this  initial  lexicon  is  to  enforce  the  monosemy 
constraint  that  each  word  in  the  corpus  be  assigned  a  unique  syntactic  category.  The  result  of  issuing 
the  query  in  the  above  example  is  a  lexicon,  with  instantiated  .syntactic  categories  for  each  lexical  entry, 
such  that  with  that  lexicon,  all  of  the  words  in  the  corpus  can  be  parsed.  Note  that  there  could  be 
several  such  lexicons,  each  produced  by  backtracking. 

Rayner  et  al.  use  a  strong  cross-situational  strategy  which  is  equivalent  to  (he  strategy  used  in 
section  3.2.  The  Prolog  program  from  figure  5.1  is  a  direct  embodiment  of  the  architecture  depicted 
in  figure  2.2.  Part  i  extends  the  work  of  Rayner  et  al.  in  a  number  of  important  ways.  First,  the  system 
described  by  Rayner  et  al.  learns  only  word-to-category  mappings  from  a  corpus  consisting  only  of 
linguistic  input.  Maimra  and  Davra  learn  word-to-meaning  mappings  in  addition  to  word-to-category 
mappings  by  correlating  the  non-linguistic  context  with  the  linguistic  input.  Second,  like  Maimra,  the 
system  described  by  Rayner  et  al.  is  given  a  fixed  language-specific  grammar  as  input.  Davra  and 
Keni'nia  learn  language-specific  grammatical  information  along  with  the  lexicon.  Third,  like  the  first 
implementation  of  Davra.  the  system  described  by  Rayner  et  al.  keeps  the  whole  corpus  in  memory 
throughout  the  learning  process,  using  a  simple  chronological  backtracking  scheme  to  .search  for  a  lexicon 
consistent  with  the  entire  corpus.  Maimra  explores  ways  of  representing  the  consistent  language  models 
using  disjunctive  lexicon  formulae  so  that  the  corpus  need  not  be  retained  in  memory  to  support  strong 
cross-situational  learning.  The  revised  implementation  of  Davra,  along  with  Kenunia,  explore  weaker 
learning  strategies  which  also  do  not  retain  the  corpus  in  memory.  Nonetheless,  the  work  of  Rayner  et  al. 
was  strong  early  motivation  for  the  work  described  in  this  thesis. 


5.1.7  Feldman 

Feldman  et  al.  (1990)  have  proposed  a  miniature  language  acquisition  task  as  a  touchstone  problem  for 
cognitive  science.  This  task  is  similar  in  many  ways  to  the  language  learning  task  described  in  part  I  of 
this  thesis,  combined  with  the  visual  perception  task  described  in  part  II  of  this  thesis.  The  proposed 
task  is  to  construct  a  computer  system  with  tJie  following  capacity. 

The  system  is  given  examples  of  pictures  paired  with  true  statements  about  those  pictures 
in  an  arbitrary  language. 
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The  system  is  to  learn  tlie  relevant  portion  of  tin-  language  well  enough  so  that  given  a  new 
sentence  of  that  language,  it  can  tell  whether  or  not  tlie  sentence  is  true  of  t  he  accompanying 
|)icture. 


f’eldman  et  al.  go  on  to  s|)ecify  an  iustatice  of  this  general  task,  called  the  Z.,,  problem,  where  the 
pictures  are  constrained  to  contain  only  geometric  figures  of  limited  variation  and  the  language  fragment 
is  constrained  to  describe  only  a  limited  number  of  spatial  relations  between  those  figures. 

Feldman  and  his  colleagues  liave  explored  a  number  of  approaches  to  solving  the  A,,  problem.  Weber 
and  Stolcke  ( 1990)  describe  a  traditional  symbolic  approach  wliere  syntactic  knowledge  is  repre.sented  as 
a  unification  grammar  and  .semantic  infortnation  is  represented  in  first-order  logic.  This  system  however, 
does  not  learn.  It  is  simply  a  query  processor  for  Ld  as  restricted  to  Fnglish.  Stolcki*  ( 1990)  dt'scribes  a 
system  which  does  learn  to  solve  the  Ao  task.  This  system  is  base<l  on  simple  r«‘cnrrent  neural  networks. 
The  linguistic  input  to  their  system  consists  of  a  secjuence  of  sentences  such  as  .1  light  i  irdi 
a  small  squart.  These  sentences  are  composed  out  of  a  vocabulary  containing  niin'teen  words,  i  he 
words  are  presented  one-by-one  to  the  network,  being  represented  as  orthogonal  19-bit  feature  vi  ctors. 
The  non-linguistic  input  |>aired  witli  each  seiitence  consists  of  a  semantic  rejirestuitation  of  a  picture 
associateil  with  that  sentence.  Tliis  semantic  representation  is  encoded  as  a  2‘A-bit  feature  vector  of  the 
following  form. 


Predicate 


Argument  1 


Argument  2 
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relation  mod  shape  size  shade  shai>e  size  shade 


Once  trained,  the  network  acts  as  a  map  between  a  sentence  and  its  semantic  rejiresentation.  I  he  wo>-ds 
of  the  sentence  are  presented  to  the  network  one-l)y-one.  'I'he  semantic  re|)resentation  apjiears  at  tin- 
output  of  the  network  after  the  final  word  has  been  i)resente<l  as  input.  I'he  network  thus  includes 
some  feedback  to  tnodel  the  store<l  state  during  sentence  processing.  The  network  is  trained  using  back- 
propagation  while  being  presented  with  positive-only  instances  of  sentences  paired  with  tln-ir  correct 
semantic  representation.  Thus  their  system  does  not  admit  referential  unci-rtainty  The  fragment  of  A|, 
that  Stolcke  considers  allows  a  total  of  .j0o2  distinct  sentences.  Of  these.  were  used  as  training 
sentences  and  the  remainder  as  test  sentences.  Stolcke  does  not  report  the  percentage  of  lest  sentences 
which  his  systeni  is  correctly  able  to  process,  except  for  stating  that  the  training  set  contained  til  out 
of  all  81  possible  'simple  NP  .sentences'  and  that  the  system  generalized  correctly  to  the  remaining  29 
simple  NP  sentences.  Weber  (1991)  and  Stolcke  (1991)  descril>e  more  recent  continuation  of  this  work. 


5.2  Discussion 

An  ultimate  process  account  of  child  language  acquisition  must  meet  two  criteria.  It  must  be  able  to 
acquire  any  language  which  children  can  acquire,  and  it  must  he  able  to  do  so  for  any  corpus  on  which  a 
child  would  be  successful.  It  would  be  very  hard  to  prove  that  any  given  algorithm  met  these  two  universal 
criteria  since  we  lack  information  which  would  allow  us  to  perform  such  universal  (piantification.  We 
have  little  information  that  circumscribes  the  child-learnable  languages,  or  the  situations  which  supi)ort 
that  learnability.  Rather  than  a  formal  proof  of  adequacy,  a  more  reasonable  approach  would  be  to 
amass  quantitative  evidence  that  a  given  algorithm  can  acquire  many  different  languages  given  a  variety 
of  corpora  in  those  languages.  This  thesis  takes  only  a  first,  exceedingly  modest,  step  in  that  direction, 
with  the  demonstration  that  Davra  can  process  very  small  fragments  of  both  English  and  Japanese,  The 
longer-term  goal  of  this  research  is  to  extend  this  ability  to  process  larger  corpora  in  different  languages. 
Larger  corpora  are  needed  to  guarantee  that  the  algorithms  scale.  Ideally,  such  corpora  should  consist 
of  transcripts  of  actual  parental  speech  to  children,  instead  of  the  synthetic  text  currently  used. 
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Successfully  |)rocessing  large  natural  corpora  ret|uires  suriiiouutiiig  a  uuml>er  of  hurdles  iu  addition 
to  the  prohleni  of  developing  a  syntactic  theory  capable  of  accounting  for  the  linguistic  |)henoiiiena  in 
the  corpus.  One  techtiical  dilficulty  is  that  the  learning  strategy  propo.seil  here  reipnres  non-linguistic 
annotation  for  the  linguistic  input.  (Keinemher.  "’tou  cati’t  learn  a  language  sinipl>  hy  listening  to 
the  radio.")  .Available  transcript iotis  do  not  come  with  such  annotations,  at  least  not  annotations  in 
the  correct  form  or  which  contain  the  information  needed  to  |)ut  it  in  the  correct  form.  There  is  a 
way  around  this  problem.  One  could  use  an  availalile  dictionary  to  parse  the  corpus  uinler  a  fi.xeil 
.set  of  parameter  settings.  P.seiido-.semantic  information  can  then  lie  deriveil  from  the  resultin;;  jiarse 
trees.  I'he  parse  trees  themselves  can  be  taken  as  meaning  e.xpressions  in  a  .\1.MMRa/I).A\'R.\  framework. 
Alternatively,  one  could  construct  a  Kenum.v  style  t>-map  In  aiiplying  ^-theory  in  reverse.  Kach  noun 
could  be  gi\en  a  random  token  as  its  referent.  Other  terminals  would  be  given  J.  as  their  referent.  I'he  IK 
criterion  requires  that  conqiletnents  of  non-functional  categories  be  ti-marked.  For  each  such  complement 
configuration,  a  ^-mapping  is  constructed  matching  a  randomly  chosen  t?-role  to  the  ultimate  referent 
of  the  complement.  In  both  of  these  cases,  noise  woiihl  then  be  added  to  model  referential  uncertaintv . 
For  the  M.aimra/D.W'RA  framework,  the  correct  meaning  expression  would  be  ailded  to  a  set  of  random 
alternate  expressions,  possibly  derived  as  perturbations  of  the  correct  meaning.  For  the  KF.NfMA 
framework,  several  other  random  t?-mappings  could  be  added  to  the  ^-map.  'The  learning  algorithm 
would  tlieti  be  applied  to  this  corpus,  without  acce.ss  to  the  (.lictionary  and  parameti'r  settings  used 
in  its  construction.  The  algorithm  would  be  deemed  succes.sful  if  it  could  accurately  reconstruct  the 
dictionary  and  parameter  settitigs.  This  technique  for  pseudo-.semant ic  annotation  has  an  addeil  bi'iiefit. 
By  varying  the  amount  of  tioi.se  added  to  the  non-linguistic  input  one  could  analytically  determint'  the 
sensitivity  of  the  learnittg  algorithms  to  such  noi.se.  .Such  .sensitivity  predictions  could  be  compared  with 
actual  sensitivity  tneasuretnents  performed  on  children  a,s  an  oxi>erimental  test  of  predictions  made  by 
the  theory. 

A  much  more  .serious  hurdle  remains,  however,  before  the  above  experiment  could  Ix'  attempted.  The 
cros.s-situatiotial  learning  strategy  advocated  in  this  thesis  requires  that  the  learner  find  a  single  gramtiiar 
and  lexicon  that  catt  consistently  explaiti  an  entire  corpus.  This  would  be  virtually  impossible  for  nat  ural 
corpora  for  three  reasons.  First,  tiatural  cori)ora  contain  ungrammatical  input.  Tiven  ignoring  input  that 
is  truly  ungrammatical,  the  current  state  of  the  art  in  linguistic  theory  is  not  capable  of  accounting  for 
many  phenomena  occurring  in  natural  text.  While  such  text  is  grammatical  in  [)rincijde.  it  must  be 
treated  as  ungrammatical  relative  to  our  meager  linguistic  theories.  Any  strict  cross-situational  learning 
strategy  would  fail  to  find  a  language  model  consistent  with  a  corpus  that  contained  ungrammatical  input . 
Children  however,  can  learn  from  in|nit  a  substantial  fraction  of  which  is  ungramtnat ical.  Second,  a  key 
assumptioti  made  by  each  of  the  systems  discussed  in  part  1  of  this  thesis  was  the  monosemy  constraint, 
the  requirement  that  each  word  map  to  a  unique  category  and  meaning.  This  a.ssumption  is  clearly 
false.  Polysemy  runs  rampant  in  human  language.  Here  again,  a  strict  cross-situational  strategy  would 
fail  to  find  a  consistent  language  model  when  presented  with  a  corpus  that  could  only  be  explained  by 
a  polysetnous  lexicon.  Children  however,  have  no  difficulty  learning  polysemous  words.  A  final  hurdle 
involves  referential  uncertainty.  What  if  the  set  of  meanings  conjectured  by  the  learner  as  a  possible 
meaning  of  some  ob.served  utterance  does  not  contain  the  correct  meaning?  This  could  haiq)en  if  the 
correct  meaning  of  .some  utterance  is  not  readily  apparent  from  its  non-linguistic  context,  or  if  the 
learner  incorrectly  discards  the  correct  meaning,  by  some  measure  of  salience,  to  reduce  the  referential 
uncertainty  and  make  cross-situational  learning  more  tractable.  In  this  situation  again,  the  learner,  not 
ktiowing  that  no  possible  meaning  was  hypothesized  for  the  utterance,  would  fail  to  find  a  consistent 
language  model. 

Each  of  these  three  problems  is  symptotuatic  of  a  single  more  general  problem:  noise  in  the  input. 
Such  noise  can  be  dealt  with  using  a  variety  of  techniques.  One  way  would  be  to  assign  weights  to 
different  lexical  entries  and  parameter  settings,  making  the  decision  between  altertiativ*'  lexical  entries 
and  parameter  settings  a  graded  one,  rather  than  an  absolute  one.  A  scheme  cotdd  be  ado|>ted  for 
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iiuTeasing  tlif  weights  ol  t  lujse  alternatives  tliat  correri |y  explain  some  input  wlnle  tli  < n  asinn  i  li«  weights 
of  1  liose  alternatives  t  hat  fail  to  explain  some  input .  nit  imately  choosing  t  hose  alternat  i\ a  wit  h  t  la  higher 
weight.  In  the  languag<-  ac((uisit  ion  lit<‘ratnre,  such  weights  are  often  confnseil  with  prohahilit  ies  W  hih 
weights  might  have  a  [irohahilistic  interpn'tat ion.  they  need  not  have  one 

There  are  alternatives  to  weights.  One  could  instead  hnd  a  langnage  model  that  minimi/i  d  lation" 
of  ttie  litigltislic  theory.  This  olfers  a  spectrum  of  alternative  ways  of  counting  violations  .\l  oie  i  iid  ol 
the  spectrum,  the  linguistic  theory  can  he  treated  as  a  hlack  liox.  either  capahle  or  incapahle  of  parsing 
an  utterance  given  a  language  model.  \\  ilh  such  a  theory,  the  learner  would  simply  minimize  the  niimhei 
of  utterances  which  could  not  he  par.sed.  This  might  not  work  if  the  linguistic  theory  was  so  jioor  that  it 
could  parse  relatively  few  utterances  in  the  corpus.  A  more  general  apjiroach.  still  using  an  encapsulati d 
linguistic  theory,  would  he  to  allow  utterances  to  Ix'  parsed  with  minor  pert  iirhat ions  of  the  langiiagt 
motlel  and  choose  the  langtiage  model  which  allowed  the  corpus  to  he  jiarst-d  with  the  minimal  total 
associated  cost.  An  even  more  general  approach  would  he  to  have  the  jiarstM-  prodin  e  a  (piality  measure 
as  out|>ut.  Successful  parses  would  have  a  high  <|uality  measure  while  unsuccessful  parses  would  still 
have  a  noti-zero  (|uality  measure  if  they  couhl  almost'  he  parsed.  The  (pjality  measure  could  he  based 
on  which  com|)onents  of  a  modular  grammatical  theory  were  violated  In  this  case,  the  learin  r  would 
chcxjse  the  model  which  maximized  the  total  quality  of  the  parsed  corpus. 

VN'hile  thesi'  approaclu's  cati  deal  with  all  forms  of  noise,  it  seems  unrea.sonahli'  to  consuh'r  polysemy 
as  noise.  A  sitnilar  hut  mor<  plausihle  strati'gy  couhl  he  used  to  support  poly.semy.  1  he  language  iiK)d(  I 
could  he  extended  to  allow  iiolysetiious  lexical  entries.  The  cost  of  a  language  model  could  he  detiiied 
so  that  it  measured  the  amount  of  polys<*my  in  the  lexicon.  The  leann'r  could  then  lind  the  lowest  cost 
language  tiiodi'l.  i.e.  the  one  with  least  polysemy,  that  could  still  c<jnsistent ly  account  for  the  corpus. 
While  all  of  the  above  approaches  ari'  coticept itally  straightforward,  substantial  details  remain  to  he 
worked  out.  d'his  is  left  for  future  research. 
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Part  II 

Grounding  Language  in  Perception 


or) 


Chapter  6 

Introduction 


Part  II  of  this  thesis  advances  a  theory  of  event  perception.  When  people  observe  tlie  world  they 
can  generally  determine  whether  certain  events  have  happened.  Furthermore,  they  can  describe  those 
events  using  language.  For  instance,  after  seeing  John  throw  a  ball  to  Mary,  the  observer  can  say  that 
the  event  described  by  the  utterance  John  threw  the  hall  1o  Mary  has  happened,  along  with  perhaps 
events  described  by  other  utterances.  Part  II  of  this  thesis  suggests  a  mechanism  to  describe  how 
event  perception  may  work.  This  mechanism  has  been  partially  implemented  in  a  computer  program 
called  Abigail.  Abigail  watches  a  computer-generated  animated  stick-figure  movie  and  constructs 
descriptions  of  the  events  that  occur  in  that  movie.  The  input  to  Abigail  consists  solely  of  the  positions, 
orientations,  shapes,  and  sizes  of  the  line  segments  and  circles  which  constitute  the  image  at  each  frame 
during  the  movie.  Figure  (j.l  illustrates  one  frame  of  a  movie  presented  to  Abigail.  From  this  input. 
Abigail  segments  the  image  into  objects,  each  object  comprised  of  several  line  segments  and  circles, 
and  delineates  the  events  in  which  those  objects  participate. 

At  the  highest  level,  Abigail  can  be  described  as  a  program  thal  takes  an  utterance  and  a  movie 
segment  as  input,  and  determines  whether  that  utterance  describes  an  event  that  occurred  during  that 
movie  segment. 

.Abigail!  u,?n)  —  {true,  false} 

Alternatively.  Abigail  can  be  thought  of  cis  a  program  that  takes  a  movie  segment  as  input ,  and  produces 
utterances  that  describe  the  events  which  occurred  during  that  segment. 

.ABIGAlL(m)  —  {i/} 

Abigail  does  not,  however,  directly  relate  utterances  to  movies.  An  intermediate  semantic  represen¬ 
tation  mediates  between  an  utterance  and  a  movie.  For  example,  the  semantic  representation  for  the 
utterance  John  threw  the  ball  to  Mary  might  be  CAUSE(  John, GOfball. TO(Mary))).  The  intermedi¬ 
ate  semantic  representation  connects  two  halves  of  Abigail.  One  half  relates  the  semantic  representation 
to  the  movie  while  the  other  half  relates  it  to  an  utterance.  The  general  architecture  is  depicted  in  fig¬ 
ure  6.2.  In  this  architecture,  the  box  labeled  perception"  relates  semantic  descriptions  to  movies.  It  can 
be  thought  of  either  as  a  predicate 

perception!  s,  m)  —  {true,  false} 

that  determines  whether  the  event  described  by  some  semantic  expression  s  occurred  during  the  movie 
segment  m,  or  alternatively  as  a  function 

perception!  in)  —  {«} 
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Figure  6.2:  -A  depiction  of  the  architecture  of  A  bigail's  language  faculty.  It  contains  three  processing 
modules:  a  parser,  a  linker,  and  a  perceptual  component,  that  mutually  constrain  live  representa¬ 
tions:  the  input  utterance,  the  syntax  of  that  utterance,  the  meanitig  of  that  utterance,  the  visual 
perception  of  events  in  the  world,  and  a  language  model  comprising  a  grammar  and  a  lexicon.  The 
lexicon  in  turn  maps  words  to  their  syntactic  category  and  meaning,  (liven  observations  and  a 
lexicon  as  input,  this  architecture  can  produce  as  output,  utterances  which  explain  those  observa¬ 
tions.  The  long-term  objective  is  to  combine  Abigail's  perceptual  component  with  the  language 
learning  techniques  described  in  part  1  of  this  thesis  to  provide  a  cotnprehetisive  model  of  language 
acqiii.sition.  As  a  language  acquisition  device,  when  given  pairs  of  observations  and  utterances  which 
explain  tho.se  observations  as  input,  this  architecture  will  produce  as  output,  a  language  model  for 
the  language  in  which  those  utterances  were  phra.sed.  Part  1  of  this  thesis  elaborates  on  this  language 
acquisition  process. 

that  produces  a  set  of  sematitic  expressions  describing  tliose  events  which  occurred  during  the  movie 
segment.  The  two  remaining  bo.xes  in  figure  (5.2  relate  the  semantic  representation  to  ait  utterance. 

The  architecture  depicted  in  figure  0.2  is  a  very  general  mechanism  for  grounding  latiguagf'  in  percep¬ 
tion.  As  discussed  on  page  27,  it  can  support  the  comprehension,  generation,  and  acquisition  of  languagi'. 
Part  I  of  this  thesis  focussed  on  using  this  architecture  to  support  language  actpnsition.  It  described 
the  parser  and  linker  modules  in  detail  as  they  related  to  tin'  language  acqtiisition  task.  Fart  II  of  this 
thesis  will  focus  solely  on  the  perception  module,  i.e.  mechanisms  for  producing  semantic  descriptions 
of  events  from  (simulated)  visual  input.  The  two  halves  of  this  thesis  discuss  the  two  halves  of  this 
architecture  independently.  The  reason  for  this  is  that  the  two  halves  have  not  yet  been  integrated  into 
a  single  implementation.  This  integration  awaits  further  research. 

After  displaying  the  architecture  in  figure  6.2,  a  natural  first  question  that  arises  is:  Whal  i.s  an  ap- 
propriaif  iniern}ediate  semantic  representation'/  Semantic  representations  are  normally  taken  to  encode 
the  meaning  of  an  utterance.  Chapter  7  argues  that  the  notions  oi  support .  contact,  and  attachment  are 
central  to  defining  the  meanings  of  simple  spatial  motion  verbs  such  as  throw,  pick  up.  put.  and  walk. 
For  instance,  throwing  involves  moving  one's  hand  while  grasping  an  object  (attachment),  resulting  in 
the  unsupported  motion  of  that  object.  Chapter  7  further  motivates  the  need  forNncluding  the  notions 
of  support,  contact,  and  attachment  as  part  of  a  semantic  representation  scheme  by  demonstrating  tin' 
central  role  these  notions  play  in  numerous  spatial  motion  verbs.  Definitions  for  these  verbs  are  jiresented 
in  a  novel  representation  scheme  that  incorporates  these  notions.  These  definitions  are  compared  wit  h 
those  proposed  by  other  researchers  which  do  not  incorporate  such  notions,  I  claim  that  incorporating 
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tlio  aotion.s  of  support,  contact,  and  attachment  allows  formulating  more  pr('rise  definitions  of  these 
verbs. 

If  one  accepts  the  argument  that  the  semantic  representation  should  incorporate  the  notions  of 
support,  contact,  and  attachment,  a  second  question  arises;  I/ou  dots  oik  ptiriivf  support,  contad  mid 
attachmtnl  re latioiishtps;'  An  answer  to  this  question  is  necessary  in  order  to  construct  tlie  percejition 
box  from  figure  (5. 2.  Chapter  8  offers  a  unified  answer  to  that  question:  cauiiterfnctunl  siinulHtioii.  .An 
object  is  supported  if  it  does  not  fall  when  one  imagines  the  short-term  future.  Likewi.se,  one  object 
supports  another  object  if  the  latter  is  supported  but  loses  that  support  when  one  imagines  the  short¬ 
term  future  of  a  world  without  the  former  object.  When  one  object  supports  another  they  must  l)e  in 
contact  with  each  other.  Furthermore,  two  objects  are  assumed  to  be  attached  to  each  other  if  such 
an  attachment  must  be  hypothesized  to  explain  the  fact  that  one  object  sui>ports  the  other.  Chapter  N 
elaborates  on  these  ideas.  A  simplified  version  of  these  ideas  has  been  imidemented  in  .Abio.aii..  ABIti.vil. 
uses  counterfactual  simulation  to  determine  the  attachment  relations  between  the  line  .segments  and 
circles  which  constitute  each  frame  of  the  movie  she  watches.  This  allows  her  to  aggregate  the  line 
segments  and  circle  into  objects.  She  then  uses  counterfactual  simulation  to  determine  support,  contact, 
and  attachment  relations  between  those  objects.  Chapter  8  also  discusses  some  experiments  |>erformed 
by  Freyd  et  al.  ( 1988)  which  give  evidence  that  human  visual  perception  operates  in  an  analogous  fashion. 

If  one  accepts  the  claim  that  support,  contact,  and  attachment  relations  are  recovered  liy  counter- 
factual  simulation,  a  third  question  thou  arises:  If'Artf  i.s  ih(  natun  of  tin  iiKchaiiisni  used  to  ptrform 
counferfactual  simulation  '^  Nominally,  the  simulator  predicts  the  behavior  of  machine-like  mechanisms, 
parts  connected  by  joints,  under  the  influence  of  forces  sucJi  as  gravity.  Chapter  9  argues  howeviu, 
that  traditional  approaches  to  kinematic  simulation,  namely  those  ba.sed  on  numerical  integration,  are 
inappropriate  as  cognitive  models  of  the  human  imagitiation  capacity  since  the  traditional  approaches 
take  physical  accuracy  to  be  primary  and  collision  detection  to  be  secondary.  In  contrast,  human  visual 
perception  appears  to  take  certain  naive  physical  notions  such  as  substantiality,  the  constraint  that  solid 
objects  can't  pass  through  one  another,  and  continuity,  the  cotistraint  that  objects  must  follow  continu¬ 
ous  paths  during  motion,  to  be  primary.  Chapter  9  pre.sents  a  kiitematic  simulator  for  tlie  micro-world 
of  line  segments  and  circles  which  takes  substantiality  and  continuity,  along  with  gravity,  to  be  primary. 
This  simulator  directly  encodes  such  principles  allowing  it  to  quickly  predict  in  a  single  step,  for  instance, 
that  an  object  will  fall  precisely  the  distance  required  for  it  to  come  in  contact  with  the  object  beneath 
it.  Traditional  simulators  based  on  numerical  integration  would  require  many  small  perturbations  to 
make  such  a  prediction.  While  such  simulators  are  more  accurate  than  the  simulator  described  here,  and 
can  simulate  a  larger  class  of  mechanisms,  the  simulator  described  in  chapter  9  is  much  faster  and  better 
suited  to  the  task  of  discerning  support,  contact,  and  attachment  relations,  ('hapter  9  also  discusses 
some  experiments  performed  by  Baillargeon  et  al.  (198-')),  Baillargeon  (198().  1987).  and  Spelke  (1988) 
which  give  eviaence  that  young  infants  are  sensitive  to  violations  of  naive  physical  constraints  such  as 
substantiality  and  continuity.  The  remainder  of  this  chapter  describes  the  event  perception  task  faced 
by  .Abigail  since  this  task  motivates  the  formulation  of  the  algorithms  discussed  later  in  part  11  of  this 
thesis. 

6.1  The  Event  Perception  Task 

Abigail  is  shown  a  computer-generated  animation  depicting  objects  such  as  tables,  chairs,  boxes,  balls, 
and  people.  During  the  movie,  the  objects  participate  in  events.  The  people  walk,  pick  up.  and  put  down 
objects,  and  so  forth.  The  task  faced  by  Abigail  is  to  determine  which  events  occur  and  when  they 
happened.  For  instance,  after  a  movie  segment  depicting  John  walking  to  the  table,  she  is  to  produce 
a  representation  of  the  utterance  John  walked  to  the  table.  For  simplicity,  the  movie  shown  to  Abigail 
is  a  stick  figure  animation,  constructed  solely  from  line  segments  and  circles.  These  line  segments  and 
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circU's,  colleclively  calUnl  figures,  coiistitiile  tho  lowest  level  striiiMiir*-  of  the  iuiaj!,e.  Higlier-li'vel  ohjccls. 
such  as  tables,  chairs,  aiul  people,  are  constructed  out  of  collections  of  figures,  figure  (i.l  shows  a  typical 
frame  from  one  of  the  movies  which  is  shown  to  Abioaii., 

The  movie  shown  to  Abic:ail  consists  of  a  sei|uence  of  such  frames  containing  objects  built  out  of 
figures.  As  the  movie  jtrogresses,  the  objects  move  about  and  participate  in  various  events.  Abkiau. 
is  not  given  any  explicit  infortnatioi!  about  the  non-atomic  entities  in  the  movie.  She  is  not  told  which 
collections  of  figures  constitute  objects  nor  is  she  toUl  which  events  they  participati'  in.  Furt l.|  rmor<'. 
siie  is  not  even  told  what  types  of  objects  exists  in  the  world  or  what  types  of  e\enls  can  occur.  J  lie 
only  input  that  Abigail  receives  is  the  po.sition.  orientation,  shape,  aiul  size  of  tin*  figures  in  each  movie 
frame. 

Abigail  faces  a  two-stage  task.  First,  she  must  recover  a  description  of  the  objects  and  events 
occurring  in  the  movie,  solely  from  information  about  the  constituent  figures.  Secotul.  sin*  must  form 
a  mapping  between  the  recovered  object  atid  event  re|)resentations.  and  the  linguistic  utterances  which 
describe  those  events.  To  date,  only  part  of  the  first  task  has  been  accomidished,  I'he  second  task 
has  not  been  attempted.  Part  11  of  this  thesis  therefore,  addresses  only  the  first  task.  It  proposes  a 
novel  approach  to  the  task  of  event  perception  and  presents,  in  detail,  the  mechanisms  underlying  this 
approach.  As  discussed  in  chapter  1.  the  long-term  goal  of  this  reseat  h  is  to  use  the  object  and  event 
representations  recovered  by  .Abigail  as  the  non-linguist ic  input  to  languagi*  acipiisilion  moilels  such 
as  tho.se  described  in  part  I  of  this  thesis.  Linking  models  of  language  accpiisition  to  models  of  event 
perception  would  allow  a  comprehensive  study  of  the  acquisition  of  word  meanings  in  a  way  which  is  not 
possible  without  perceptual  grounding  of  those  word  meanings. 

The  perceptual  mechanisms  used  by  Abigail  to  recover  object  and  event  descriptions  are  very 
general,  f'lilike  some  prior  approaches,  thej  do  not  iticorfiorate  any  knowledge  that  is  sjiecific  to  an\ 
class  of  objects  or  events.  Thus,  they  do  not  contain  models  of  particular  objects  such  as  tables  or 
particular  events  such  as  walking.  The  intention  is  that  the  same  unaltered  perceptual  nu'chanism  be 
capable  of  recovering  reasonable  object  and  event  descriptions  from  any  movie  constructed  out  of  line 
segments  and  circles. 

In  order  to  verify  whether  .Abigail's  unaltered  perceptual  mechanisms  are  indeed  capable  of  analyzing 
any  movie,  a  simple  movie  construction  tool  was  created  to  facilitate  the  generation  of  numerous  movies 
with  which  to  test  Abigail.  This  tool  takes  a  script  and  generates  the  positions,  orientations,  shapes, 
and  sizes  of  the  figures  at  each  frame  during  the  movie.  While  the  script  itself  delineates  objects  and 
events,  the  perceptual  mechanisms  of  Abigail  have  no  access  to  the  representation  of  objects  and  events 
in  the  script  and  must  recover  the  object  and  event  information  solely  from  the  [lositions,  orientations, 
shapes,  and  sizes  of  the  figures  in  the  movie  generated  from  the  .script. 

A  sample  movie  script  is  shown  in  figure  6.3.  This  script  generates  a  movie  consisting  of  1063  frames, 
the  first  of  which  is  depicted  in  figure  6.1.  Each  frame  is  constructed  from  43  figures:  o  circles  and 
38  line  segments.  These  figures  form  caricatures  of  7  objects:  a  table,  two  chairs,  a  box.  a  ball,  a  man. 
and  a  woman.  The  script  of  this  movie  is  simple  and  fairly  boring.  The  man,  .John,  walks  over  to  the 
table  and  picks  up  the  ball.  He  turns  around  and  walks  back  to  his  original  position.  He  then  turns 
around  again,  walks  back  to  the  table,  puts  the  ball  down  on  the  table,  turns  around,  and  walks  back 
to  his  original  position.  The  woman,  Mary,  then  performs  a  similar  task.  Finally.  .John  walks  toward 
the  table,  picks  up  the  ball,  carries  it  over  to  Mary,  and  gives  it  to  her.  He  then  turns  around  and  walks 
back  to  his  place,  after  which  Mary  walks  tow*ard  the  table,  puts  the  ball  on  the  table,  and  returns  to  her 
place.  Figure  6.4  depicts  the  general  sequence  of  events  in  this  movie  by  showing  a  selection  of  several 
key  frames  from  the  movie. 

The  original  expectation  was  that  Abigail  would  be  able  to  successfully  process  numerous  movies. 
That  goal  was  overly  ambitious.  Most  of  the  development  of  .Abigail  was  driven  by  only  one  movie,  the 
one  generated  by  the  script  in  figure  6.3  and  depicted  in  figure  6.4.  In  fact,  due  to  comi)uler  procccssing 
limitations  and  to  the  current  incomplete  state  of  Abigail's  implementation,  only  a  portion  of  that 
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(def ine-Bovie  noviel  ((table  (sake- instance 

’table  :na>e  ’table  :x  16.0  :y  0.0  :aorld  aorld)) 

(chairl  (aake-instance 

’chair  ;naae  ’chairl  :x  12.0  :y  0.0  :sorld  world)) 

(chair2  (make-instance 
’chair 

;naBe  ’chair2  :x  20.0  :y  0.0  :direction  -1.0  rworld  world)) 
(box  (metke-instance  ’box  :naBe  ’box  :x  18.0  :y  2.525  :world  world)) 
(ball  (make-instance 

’ball  ;name  ’ball  :x  14.0  :y  3.0  :world  world)) 

(John  (make-instance 

’man  .-name  ’John  ;x  3.0  ;y  0.0  :world  world)) 

(mary  (make-instance 

’woman 

:name  ’mary  :x  30.0  :y  0.0  ; direct ion  -1.0  : world  world))) 
(walk-to  John  (x  (center  ball))) 

(pick-up  (left-hand  John)  ball) 

(about-face  John) 

(walk-n-steps  John  4) 

(walk-to  John  (x  (center  table))) 

(put-down  (left-hand  John) 

(x  (center  table)) 

(■*■  (y  (point  1  (top  table)))  (size  (circle  ball)))) 

(about-face  John) 

(walk-n-steps  John  4) 

(about-face  John) 

(walk-to  mary  (x  (center  ball))) 

(pick-up  (left-hand  mary)  ball) 

(about-face  mary) 

(walk-n-steps  mary  5) 

(walk-to  mary  (x  (center  table))) 

(put-down  (left-hand  mary) 

(x  (center  table)) 

(■*■  (y  (pointl  (top  table)))  (size  (circle  ball)))) 

(about-face  mary) 

(walk-n-steps  mary  5) 

(about-face  mary) 

(walk-to  John  (x  (center  ball))) 

(pick-up  (right-hand  John)  ball) 

(walk-to  john  (x  (center  mary))) 

(give  (right-hand  john)  (left-heuid  mary)) 

(about-face  john) 

(walk-n-steps  john  9) 

(walk-to  mary  (x  (center  table))) 

(put-down  (left-hand  m2ury) 

(x  (center  table)) 

(+  (y  (pointl  (top  table)))  (size  (circle  ball)))) 

(about-face  mary) 

(walk-n-steps  maury  5) 

(about-face  maury)) 


Figure  6.3:  A  script  used  to  generate  a  movie  to  be  watched  by  Abigail.  The  first  frame  of  this 
movie  is  shown  in  figure  6.1.  The  general  sequence  of  events  in  this  movie  is  depicted  by  the  selection 
of  frames  in  figure  6.4. 
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movie  has  been  successfully  processed  and  analyzed  by  Abkjail.  Future  work  will  atlein|>l  to  extend 
the  results  described  in  this  thesis  by  running  Abiciail  on  other  movies. 

6.2  Outline 

The  remainder  of  part  il  of  this  thesis  contains  four  chapters.  Chapter  7  advances  that  claim  that  the 
notions  of  support,  contact,  and  attachment  play  a  pivotal  role  in  <lelining  the  jirototypical  meanings 
of  simple  spatial  motion  verbs.  It  surveys  past  attempts  at  defining  the  meanings  of  many  such  verbs, 
finding  these  attempts  inadequate.  An  alternative  representation  scheme  is  put  forth  which  highlights 
the  notions  of  support,  contact  and  attachment.  Chapter  X  propo.ses  a  computational  mechanism, 
implemented  in  .Abigail,  for  perceiving  support,  contact,  and  attachment  relations.  It  advances  th.- 
claim  that  such  relations  are  not  recovered  by  static  analysis  of  images  but  rather  require  counlerfact  ual 
simulation.  Chapter  9  suggests  that  the  simulation  performed  as  part  of  event  perce|>tion  differs  from 
traditional  kinematic  simulation  in  that  it  takes  the  naive  physical  notions  of  substantiality,  continuitv. 
gravity,  and  ground  plane  to  be  primary,  and  physical  accuracy  and  coverage  to  be  secondary.  It  describes 
in  detail,  the  novel  kinematic  simulator  that  acts  as  Abigail's  imagination  capacity.  ( ‘hapter  10  discusses 
related  work  and  concludes  with  an  outline  of  potential  future  work 


Chapter  7 

Lexical  Semantics 


Part  II  of  tills  thesis  advances  a  theory  of  event  percention.  It  proposes  a  ineclianisin  for  how  pi’oph- 
visually  recognize  the  occurrence  of  events  ilescril)e«l  l*y  simple  spatial  motion  vt-rhs  such  as  thmii .  milk, 
pick  up.  and  pul  The  proposed  recognition  process  is  decoiiipositional.  Kach  event  type  is  successively 
broken  down  into  more  basic  notions  that  ultimately  can  be  grounded  in  perception.  For  instance,  a 
throwing  event  comprises  two  constituetil  events:  moving  one's  liatid  while  grasping  an  object,  followed 
by  the  unsupported  tnotion  of  that  object.  The  words  firaspnig  and  uiisiipporh  d  play  a  pivotal  role  in 
this  description  of  throwing.  Ati  event  would  tiot  typically  be  de.scribi'd  as  throwing  if  it  did  not  involve 
the  grasping  and  releasing  of  an  object  along  with  the  rc'sulting  nn.supported  tnotion.  .Many  prior 
approaches  to  defitiitig  the  meatiitig  of  the  word  Ihroti  (e.g.  Miller  197*2,  Schank  197:5.  Jackendotf  19n;5. 
and  Pinker  1989),  however,  do  not  highlight  this  pivotal  role,  hi  this  chapter.  I  advance  th*'  claim  that 
the  notions  of  support,  contact,  atid  attachment  are  central  to  describitig  tnany  cotiitiioti  spatial  tiiotioti 
events.  Accurately  delitieating  the  occurretice  of  such  events  from  tion-occurrences  hitiges  oti  the  ability 
of  perceiving  support,  contact,  and  attachment  relationships  between  objects  in  the  world.  In  chajiters  8 
and  9.  I  offer  a  theory  of  how  to  groutul  the  perception  of  these  relations. 

A  central  assuniptioti  of  this  work  is  that  perception  is  intimately  tied  to  language.  We  use  words 
and  utterances  to  describe  events  that  we  perceive.  The  meaning  of  a  word  is  typically  thought  of 
as  conditions  on  its  appropriate  use.  It  thus  seems  natural  to  relate  the  tiieatiing  of  a  word  such  as 
throw  to  a  procedure  for  detecting  throwing  events.  .Many  schemes  have  been  propo.sed  for  represiuiting 
the  meanings  of  words  and  utteratices  (cf.  Miller.  Schank.  Jacketuloff.  atid  Pinker).  1  will  show  that 
these  schemes  cannot  he  taken  as  procedures  for  recognizing  the  events  that  they  attempt  to  describe 
because  they  lack  the  notions  of  support,  contact,  and  attachment.  Accorditigly.  I  propose  a  difh'rent 
representation  scheme  that  incorporates  these  notions  into  definitiotis  of  word  nieatiings.  The  central 
focus  of  this  work  is  the  ability  for  recognizitig  events  by  grounding  the  notions  of  su|)port,  contact,  and 
attachment.  Therefore,  the  representation  .scheme  developed  here  exaggerates  the  role  played  by  the.se 
notions. 

For  the  remainder  of  this  chapter,  I  will  di.scu.ss  the  meanings  of  a  mmiber  of  spatial  motion  verbs.  I 
will  show  how  prior  definitions  proposed  for  these  verbs  cannot  be  used  as  event  recognition  procedures. 
For  each  verb  I  will  then  propose  an  alternate  definition  that  highlights  the  role  played  by  the  notions 
of  support,  contact,  and  attachment  in  characterizing  the  events  described  by  that  verb. 

Consider  the  word  throw.  The  Random  House  dictionary  (Stein  et  al.  197.'»)  offers  the  following 
definition  for  throw. 


throw  v.t.  1.  to  propel  or  cast  in  any  way  esp.  to  project  or  propel  from  the  hand  by  a 
sudden  forward  motion  or  straightening  of  the  arm  and  wrist 
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(  U.XrTKIi  7  l.t.XK  M.  St  M  XSTK  S 


riiis  (leliiiit ion  compriM's  two  p->’  '  a  )*,'‘'»'ral  coiitiilioii  aiul  a  iii<>r<  proloix  pical  Mtiialioii  Hotli  ol 
thof,  liowcvcr,  ailmil  <  \t‘iils  a  *"'  li  on*'  woultl  not  normally  coiifulor  to  In-  t lirowinv, 'M-nl^.  lor  in^tain  *  . 
rolling  a  howling  l>all  down  a  Imwlinn  laiif.  A  ^iu,n  at  a  howliii"  alloy  dial  >aiil  l’li‘a>i  do  not  ilinm  hall^ 
dovxn  llio  alloy’  doo.>  not  ooiiNidor  rollin;;  a  liowlin^  hall  to  In-  llinnviny,  1  lio  difioii  ino  lio>  ni  wlioiln  i 
or  not  tlio  rosnltian  motion  i^  ml^upportoll. 

Millor  (p  d-').-))  oHor?-  tlio  I'ollowin^  doliniiton  for  throw 

to  apply  foiT*‘  by  hau<l  t<»  caiiso  to  to  trav<-l  through  air 

At  first  glanco,  it  appoars  that  Millor  is  attomptmu  to  (-aptnii  tin-  notion  of  support  throunli  tin 
siatomont  through  air.  \\o  mij!,lii  tako  tin-  siatomont  through  air  not  as  litorally  moamnt!,  ■tliri>nvi,li  air 
vvliicli  would  admit  snpportod  motion  llironu.li  tin-  air.  I>nt  as  a  ul<>ss  for  misnpportoil  moiion  lint 
olsowlioro  .Millor  groups  through  air  along  with  through  wator  and  on  lainl  as  tin-  moi/nmi  of 
motion.  Millor  dotinos  swim  as  to  travrd  through  wat<‘r  (p.  d-'d)  and  inilk  as  to  trav*‘l  on  laml  by- 
foot  (p.  did).  Kurt linmioro,  as  wo  shall  soo.  tin-  glossos  givon  hy  Millor  for  otln  r  worils  whoso  ilolimtioiis 
roipiiro  I  ho  notion  ofsu|>port  do  not  inrorporato  tin-  through  air  i>rimiiivo 
Schank  offors  tho  following  two  dofinitions  for  throw. 


(i) 


X  throw  Z  (it  d’: 


X<^IM{()I>KL— Z 


n 


X 


(ii)  X  throw  Z  to  \  . 


X^PTRANS— Z 


X 

it 

PHOIMI. 


Z 


A 

In 

A  N' 

X  d 


Tho  first  doscrihos  throwing  as  propolling  an  ohjort  Z  on  a  path  from  tin-  agont  .V  to  tin-  dostination  V 
Tho  socond  appoars  to  add  tho  statonuuit  that  Z  must  actually  roach  V  to  In-  thrown  to  its  dostination 
Noithor  of  thoso  di'finitions  montion  tho  unsupportod  nattiro  of  tho  rosulting  motion. 

JackondofT  (p.  17d)  offors  tho  following  gloss  for  tin-  siatomont  Hi  th  thn  ir  tin  hull  out  tin  iriiiiloir. 

( '  A I '  S  F,(  Both.  ( ;()( ball.  OT’I  ( window ) ) ) 

Whilo  in  this  oxamplo.  tho  unsu]>portod  naturo  of  tin-  rosulting  moiion  is  im|)liod  hy  tin-  fact  that  iho 
hall  is  hoing  thrown  out  tho  vviinlow,  nothing  in  tho  roprixsonlation  convoys  this  information.  If  oin- 
takos  ( 'Al  SF^( I/,  c))  as  tho  moaning  of  throw,  this  dofinition  admits  many  non-throwing  ovonis. 

Pinkor  (p.  Zltt)  offors  tho  following  dofinition  for  tho  word  throw  via  tho  gloss  for  tho  siatomont  lioh 
thriw  thi  boT  to  Bill.' 


’For  l,v|>ogra)>hi<'al  reasons.  1  liave  oinilteil  llie  tiiiie-line  coiiipoiieni  ol  t’iiiker's  repr'-senlations.  It  is  not  rcicvani 
to  the  ciirrenl  discussion.  Hie  method  for  aniiotaling  effect  ami  for/to  liraiiches  is  allereit  soinewhal  as  well,  again  for 
lyfMigraphical  reasons. 


1(17 


t  lircnv: 


k\i:n  1 


(box) 


K)  THlNd 
(Hill) 


riiis  gUws  Piirapsulatt's  tiu'  (list iiiol  ioii  Ix't \vt>('ii  throwiiijf  and  non-throwing  events  in  the  iiianner  at- 
trihnti'  "t lirowing" .  Since  this  is  an  iininter|>ret«Hl  syinhol,  it  offers  little  help  in  Iniildin^  a  procedure 
for  ri'cognizinj!,  throwinj;  events. 

In  short,  none  of  the  representation  schenu's  proposed  hy  Miller,  Schank,  Jackendoff  and  Pinkt'r 
contain  a  primitive  for  descrihinji,  support.  Ihns  in  these  schenu's.  one  could  not  r('forinnlat('  l)ettt'r 
definitions  around  the  notioti  of  support  withotit  adding  such  a  iirimitivi'.  fhnker  (j).  201)  gives  the 
followitig  definition  for  the  word 


siip|)ort: 

ST.VrK 


ACT  THINC  THINC  prevent 

[  ]  [^-]  i:\i:sT 


CO  THINC  PATH 

N-  I 

down 


but  does  not  recognize  the  need  to  incorporate  this  structure  as  jiart  of  the  definitions  of  other  words 
which  depend  on  support. 

The  definitions  for  ihi-oir  given  by  Schank.  Jackendoff.  and  Pinker  also  do  not  mi'iition  the  role  played 
by  one's  hand  in  throwing  an  object.  Numerous  non-tlirowing  events  such  as  kicking,  or  bumping  into 
an  object  causing  it  to  fall,  would  satisfy  the  above  di'finitions  even  though  they  are  not  prototypical 
throwing  events.  Random  House  and  Miller  atU’iiipt  to  capture  this  requirement  via  the  statenii  Is 
■from  the  hand'  or  by  hand.  Even  tliese  do  not  exprt'ss  the  notion  tliat  prototypical  throwing  involves 
grasping  an  object  and  subsequently  releasing  it.  Combined  with  not  specifying  unsupported  motion, 
not  specifying  this  grasping-releasing  transition  allows  all  of  the  definitions  for  Ihrow  given  by  Miller. 
Schank,  Jackendoff.  and  Pinker  to  admit  many  non-throwing  events  such  as  pushings,  pullings,  and 
carryings.  In  fact,  even  the  Random  House  definiticjii  would  suffer  from  this  [(roblem  were  it  not  for  the 
words  "or  cast'  appended  to  ’propel'  in  its  definition  for  throw. 

In  contrast.  I  propose  the  following  alternative  definition  for  Ihrou. 
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(define  throw  (x  y) 

(exists  (i  j) 

(and  (during  i  (move  (hand  x))) 

(during  i  (move  y)) 

(during  i  (contacts  (hand  x)  y)) 

(during  i  (attached  (hand  x)  y)) 

(during  j  (not  (contacts  (hand  x)  y))) 

(during  j  (not  (attached  (hand  x)  y))) 

(during  j  (move  y)) 

(during  j  (not  (supported  y))) 

(=  (end  i)  (beginning  j))))) 

Informally,  this  states  that  a  throwing  event  comprises  two  conseontivt'  time  intervals  i  and  j.  where 
during  /,  both  r's  hand  and  y  are  moving,  and  r's  hand  is  in  contact  with  and  attached  to  y.  while 

during  j,  r's  hand  is  no  longer  in  contact  with  and  attached  to  y.  and  y  is  in  unsup(iorted  motion. 

Note  that  this  definition  incorporates  the  grasping  and  releasing  action  of  the  agent  followed  by  the 
unsupported  motion  motion  of  the  patient,  aspects  of  throwing  not  cajUnred  by  the  definitions  advanced 
by  Miller,  Schank,  Jackendoff,  and  Pinker.  1  will  not  formally  define  the  notation  used  for  defining 
words.  In  fact,  I  have  taken  some  liberty  with  the  notation,  sacrificing  precision  in  favor  of  expository 
simplicity.  What  I  hope  to  convey,  how'ever,  is  the  belief  that  if  one  could  ground  the  notions  of  support, 
contact,  and  attachmeivt,  in  addition  to  movement,  one  could  use  the  above  definition  as  a  jirocedure 
for  perceiving  throwing  events, 

1  should  stre,ss  that  I  do  not  advance  such  a  definition  as  embodying  the  necessary  and  sufficient 
conditions  for  the  use  of  the  word  throw.  Even  ignoring  metaphorical  and  idiomatic  uses,  the  word  throw 
can  be  extended  to  a  variety  of  situations.  The  above  definition  attempts  to  describe  only  prototypical 
throw'ing  events.  It  is  a  well-known  philosophical  quagmire  to  attempt  to  formally  circumscril>e  the 
meaning  of  a  word  or  even  to  characterize  prototypical  events  and  their  extensions.  To  avoid  such 
difficulties,  I  will  simply  say  that  the  definitions  presented  here  try  to  capture  our  intuitive  notions  of 
the  events  they  de.scribe,  better  than  prior  representations.  1  offer  no  way  to  substantiate  this  claim 
except  for  the  projected  eventual  success  in  using  these  definitions  as  part  of  an  implemented  computer 
program  to  accurately  differentiate  occurrences  from  non-occurrences  of  the  events  they  descrilie  in 
animated  movies.  Since  the  implementation  of  that  program  is  still  underway,  1  can  only  hope  to 
convince  the  reader  that  the  mechanisms  1  propose  in  part  11  of  this  thesis  show  some  actual  promise 
of  achieving  these  aims.  One  should  note  that  neither  Miller,  Schank.  Jackendoff,  nor  Pinker  offer  any 
better  substantiation  of  their  respective  representation  schemes. 

I  also  want  to  point  out  a  number  of  issues  pertaining  to  the  above  definition  and  others  like  it.  First , 
it  does  not  specify  precisely  when  the  throwing  event  occurred.  For  most  verbs  like  throw,  it  is  unclear 
whether  the  actual  event  described  spanned  both  i  and  j.  just  i  or  j.  some  portion  of  either  i  or  j.  or  just 
the  transition  between  i  and  j.  The  notation  intentionally  leaves  this  question  unanswered  in  the  absence 
of  suitable  criteria  for  determining  the  appropriate  solution.  The  intention  is  to  interpret  the  notation 
as  stating  that  the  event  occurred  .sometime  during  the  interval  spanning  i  and  j  given  that  the  criteria 
for  i  and  j  are  met.  Second,  the  definition  does  not  express  certain  other  notions  that  we  intuitively 
believe  to  be  part  of  throwing  events.  For  instance,  j-'s  hand  imparting  force  to  y  during  i.  or  that  force 
causing  the  unsupported  motion  during  j.  Clearly  notions  such  as  force  application  and  causality  play  an 
important  role  in  the  meaning  of  most  spatial  motion  verbs.  I  leave  such  notions  out  of  definit  ions  simply 
because  I  do  not  yet  know  how'  to  perceptually  ground  them.  Section  10.2  will  offer  some  speculation 
on  how  the  methods  described  in  part  II  of  this  thesis  can  be  extended  to  support  perception  of  force 
application  and  causality,  allowing  such  notions  to  be  included  in  revised  definitions  for  verbs  like  throw. 
Finally,  the  above  definition  contains  redundant  information.  Stating  that  j 's  hand  is  attached  to  y 
during  i  implies  that  it  contacts  y  during  that  interval  as  well.  Likewise,  stating  that  j  's  hand  is  moving 


(luring  /,  wliilf  i(  is  atlacluHl  lo  (/.  iiiiplit'S  that  //  iiiiisl  also  li<-  iiioviiig  (hiring  that  inicrval  1- ni  l  licniiorc. 
stating  that  t)  is  utisupported  during  j  implii's  tliat  j  's  hand  is  iicitln'r  in  contact  with,  nor  attaclicd  to, 
1/  ditring  that  intr-rval,  I  inclndt'  such  redundant  infurniatKui  for  two  rea.sons,  first,  it  may  allow  mor( 
robust  detection  of  events  given  unodiable  |>rimit ives.  Second,  the  redundant  prototyi>ical  delinition  is 
mor('  suitable  for  extension  to  tion-prot(>t ypical  situations,  for  ('xample,  throwing  tliat  (lo(‘s  not  involve 
utisupported  motion  of  ati  object  still  involva's  tin'  relea,se  of  that  object  at  some  point  during  its  motion 
Perhaps  some  variatit  of  structure  mapping  ((lenliier  lilt'd,  falk('tdiain('r  et  al,  ItlStl)  a|)i)li('(l  to  such 
redtindant  delinitions  can  form  a  basis  for  generalizing  protoly|>e  definitions  to  idiomatic,  metaphorical, 
and  other  extended  usi's  (cf,  Lakoff  1987), 

'  Ming  these  and  tnaiiy  other  subth'tii's  aside  then,  let  us  examine  some  other  va-rbs  for  which 
support,  contact,  and  attachment  play  an  imporlatit  role.  Consider  the  verbs  fall.  droi).  botina.  and 
jump.  Miller  (p.  doT)  gives  the  following  definitions  for  these  words. 

fall:  to  travel  downward 

drop:  to  cause  to  travel  downward 

biiuucc  to  travel  up  and  down 

jump:  to  travel  over 

These  definitions  seem  not  to  accurately  capture  the  meanings  of  these  words  since  they  lack  the  notion 
of  support,  contact,  and  attaclunent.  Falling  is  unsupported  tnotion.  Oik-  is  tiot  falling  when  one  is 
walkitig  down  stairs.  Droppitig  must  result  in  falling.  Ont'  is  not  dropping  a  tea  cup  when  otn*  is  gently 
placing  it  otito  its  saucer.  Furthermore,  not  just  any  causation  of  falling  counts  as  dropping.  I’nshing 
or  knocking  an  object  off  a  ledge  is  not  dropping  that  object.  Droppitig  an  objt'ct  r('()uires  that  the 
agent  previously  grasp,  or  at  least  suiiport.  that  object  prior  to  its  fallitig.  Bouncing  seems  to  involve 
temporary  contact  more  thati  uji-and-down  motion.  One  cati  bounce  a  ball  horizontally  against  a  wall. 
Furthermore,  tiot  all  itp-and-dowti  tnotion  is  bouncing.  A  book  is  not  bouncing  when  otie  picks  it  up 
and  puts  it  dowti  .somewhere  el.se.  Jumping  too.  .seems  to  involve  support,  in  particular  a  self-induced 
state  change  from  being  supported  to  being  unsupported,  typically  incorporating  upward  tnotion.  One 
need  not  travel  over  somethitig  to  successfully  jutnp. 

Schatik  gives  the  following  definitions  for  fall  and  drop. 

X  fall:  nf<t=>PROPEL-^Z 

X<;^(1KA.SP-2_z 

Ir 

X  drop  Z: 

nf<=>PROPEL— Z 


The.se  require  only  that  'nf.  the  natural  force  of  gravity,  propel  an  object  toward  the  groutid.  and  do 
not  require  the  object  to  be  utisupported.  They  admit  a  situation  where  one  is  lowering  a  bucket  into  a 
well  as  a  case  where  one  dropped  the  bucket  and  it  is  falling. 

In  contrast,  I  propose  the  following  definitions  for  the  verbs  fall,  di'op,  bounct.  attd  jump. 

(define  fall  (x) 

(exists  (i) 

(and  (during  i  (not  (supported  x))) 

(during  i  (move-down  x))))) 
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(del in®  drop  (x  y) 

(exists  (i  j) 

(and  (during  i  (contacts  (hand  x)  y)) 
(during  i  (attached  (hand  x)  y)) 
(during  i  (supports  x  y)) 

(during  i  (supported  y)) 

(during  j  (not  (contacts  (hand  x)  y))) 

(during  j  (not  (attached  (haoid  x)  y))) 

(during  j  (not  (supports  x  y))) 

(during  j  (not  (supported  y))) 

(during  j  (move-down  y)) 

(=  (end  i)  (beginning  j))))) 

(define  bounce  (x) 

(exists  (i  j  k  y) 

(and  (during  i  (not  (contacts  x  y))) 

(during  j  (contacts  x  y)) 

(during  k  (not  (contacts  x  y))) 

(=  (end  i)  (beginning  j)) 

(=  (end  j)  (beginning  k)) 

(short  j)))) 


(define  jump  (x) 

(exists  (i  j) 

(and  (during  i  (supported  x)) 

(during  j  (not  (supported  x))) 

(during  j  (moving-up  x)) 

(=  (end  i)  (beginning  j))))) 

Intuitively,  these  definitions  state  that  falling  involves  unsupported  downward  motion,  that  dropping 
involves  releasing  a  previously  grasped  object  allowing  it  to  fall,  that  bouncing  involves  temporary 
contact  and  that  jumping  involves  the  transition  from  being  supported  to  unsupported  upward  motion. 
Again,  they  are  not  meant  as  necessary  and  sufficient  conditions  on  the  use  of  these  words,  only  as 
descriptions  of  prototypical  events.  More  importantly,  they  can  be  used  as  procedures  for  recognizing 
occurrences  of  the  events  they  describe. 

There  seems  to  be  no  single  unified  notion  of  support.  The  intuitive  concept  of  support  breaks  down 
into  at  least  three  variant  notions,  each  corresponding  to  a  different  way  an  object  can  fall.  An  object 
can  fall  straight  downward,  fall  over  pivoting  about  a  point  beneath  its  center-of-mass,  or  slide  down 
cui  inclined  plane.  Whether  or  not  an  object  is  supported  in  one  way.  preventing  one  type  of  falling, 
maj'  be  independent  of  whether  it  is  supported  in  a  different  way.  Figure  7.1  illustrates  several  different 
potential  support  situations  for  an  object.  In  figure  7.1(a),  the  object  is  totally  unsupported  and  will  fall 
down.  In  figure  7.1(b),  the  object  is  prevented  from  falling  down  but  will  fall  over.  In  figure  7.1(c),  the 
object  is  prevented  from  falling  down  but  can  either  fall  over  or  slide.  In  figure  7.1(d),  the  object  will 
neither  fall  down  nor  fall  over  but  will  slide.  In  figure  7.1(e),  the  object  is  totally  supported  and  will  not 
fall  down,  fall  over,  or  slide.  Difference  in  type  of  support  appears  to  play  a  role  in  verb  meaning.  For 
instance,  throwing  seems  to  require  that  an  object  be  able  to  fall  down,  or  at  least  fall  over,  tis  in  Tht 
wrestler  threw  his  opponent  to  the  floor.  An  event  is  not  throwing  if  it  results  in  unsupported  sliding 
motion.  Similarly,  falling,  dropping,  and  jumping  most  prototypically  involve  the  ability  to  fall  dowm 
but  may  be  extended  to  cases  of  falling  over  and  perhaps  even  to  sliding.  Other  verbs  are  sensitive  to 
this  distinction  in  different  ways.  For  instance,  the  verb  lean  on  can  be  used  only  to  describe  situations 


Figure  7.1:  The  different,  varieties  of  support  relationships.  In  (a),  the  object  is  totall.v  unsupported 
and  will  fall  down.  In  (b),  the  object  is  prevented  from  falling  down  but  will  fall  over.  In  (c).  the 
object  is  prevented  from  falling  down  but  can  either  fall  over  or  slide.  In  (d).  the  object  will  neither 
feill  down  nor  fall  over  but  will  slide.  In  (e).  the  object  is  totally  supported  and  will  not  fall  down, 
fall  over,  or  slide. 


where  one  object  prevents  another  from  falling  over,  and  not  when  one  object  prevents  another  from 
falling  down.  One  is  not  leaning  on  the  floor  when  one  is  standing  on  it. 

Consider  now'  the  verb  put.  Miller  (p.  359)  defines  put  as  to  cause  to  travel.  Jackendoff  (p.  179) 
offers 

CAUSE(man.GO(book.TO  ON(table))) 

as  the  meaning  of  Tht  mav  put  th(  book  on  the  tabk.  Pinker  (p.  180)  gives  the  following  fragment  of  a 
definition  for  put. 


EVENT 


THING  PATH 
[]  [] 


PLACE 


All  of  these  definitions  involve  causing  an  object  to  move  to  a  destination.  Such  a  definition  is  overly 
general.  Jackendoff 's  expression  would  be  true  of  an  event  where  the  man  knocked  the  book  off  the 
shelf  onto  the  table,  yet  one  would  not  say  that  he  put  the  book  there.  Put  seems  to  require  the  ability 
to  control  the  precise  final  destination  of  an  object.  One  does  not  usually  have  such  control  when  one 
throws  or  kicks  an  object,  so  one  doesn’t  use  the  word  put  to  describe  such  situations.  One  way  to 
achieve  greater  positional  control  is  by  grasping  or  otherwise  supporting  an  object  while  moving  it  . 
Furthermore,  positional  control  is  achieved  only  if  the  object  is  supported  at  the  end  of  the  put  event. 
This  support  must  come  from  .something  other  than  the  hand  which  moved  it.  Otherwise,  it  has  not 
yet  reached  its  final  destination.  These  aspects  of  put,  at  least,  can  be  captured  using  the  machinery 
described  here  with  the  following  definition. 
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(define  put  (x  y) 

(exists  (i  j  z) 

(^uld  (during  i  (move  (hand  x))) 

(diaring  i  (contacts  (hand  x)  y)) 

(during  i  (attached  (hand  x)  y)) 

(during  i  (supports  x  y)) 

(during  i  (move  y)) 

(during  j  (not  (move  y))) 

(during  j  (supported  y)) 

(during  j  (supports  z  y)) 

(not  (equal  z  (hand  x))) 

(=  (end  i)  (beginning  j))))) 

Similarly,  the  prototypical  event  described  by  ptcL  up  can  expressed  as  essentially  the  inverse  operation. 

(define  pick-up  (x  y) 

(exists  (i  j  z) 

(and  (during  i  (supported  y)) 

(during  i  (supports  z  y)) 

(during  i  (contacts  z  y)) 

(during  j  (move  (hand  x))) 

(during  j  (contacts  (hand  x)  y)) 

(during  j  (attached  (hand  x)  y)) 

(during  j  (supports  x  y)) 

(during  j  (move  y)) 

(not  (equal  z  (hand  x))) 

(=  (end  i)  (beginning  j))))) 

Many  other  simple  spatial  motion  verbs  also  apparently  involve  support,  (.^onsider  carry  and  raise. 
Miller  (p.  355)  defines  these  words  as  follows. 

carry:  to  cause  to  travel  with  self 
raise:  to  cause  to  travel  up 

JackendofF  (p.  184)  defines  raise  as 

CAUSE(j-,GO(y.[path  UPWARD.;])). 

One  would  say  Larry  Bird  raised  the  ball  into  the  basket  to  describe  a  layup  but  not  a  jump  shot  even 
though  he  has  caused  upward  motion  of  the  basketball  in  either  case.  One  must  be  continually  supporting 
an  object,  perhaps  indirectly,  to  be  raising  it.  This  holds  true  even  more  so  for  the  verb  lift.  Likewise, 
one  is  not  carrying  a  baby  stroller  when  one  is  pushing  or  pulling  it,  even  though  one  is  causing  it  to 
travel  with  oneself.'  The  statement  Don't  drag  that  box.  carry  it!  would  be  infelicitous  if  the  prototypical 
carrying  event  admitted  dragging.  Accordingly,  I  offer  the  following  alternate  definitions  for  carry  and 
raise. 

(define  carry  (x  y) 

(exists  (i) 

(and  (during  i  (move  x)) 

(during  i  (move  y)) 

(dturing  i  (supports  x  y))))) 

^The  Halakhic  concept  of  not  withstanding. 


(define  raise  (x  y) 

(exists  (i) 

(and  (during  i  (supports  x  y)) 
(during  i  (move-up  y))))) 


The  verbs  described  so  far  highlight  the  need  for  support  in  their  definition,  Sui)port  is  not  tin-  only 
crucial  component  of  verb  meaning.  Contact  and  attachment  also  play  a  pivotal  role.  This  is  illustrated 
in  the  simple  verbs  slide  and  roll.  Pinker  (p.  1S2)  offers  the  following  representations  for  the  intransitive 
use  of  roll. 


The  uninterpreted  manner  attribute  offers  no  guidance  as  to  the  perceptual  mechanisms  needed  to  detect 
rolling  and  thus  to  define  the  meaning  of  the  word  roll.  A  proper  definition  of  rolling  can  be  based  on  a 
definition  of  sliding  since  rolling  occurs  when  sliding  doesn't.  One  object  slides  against  another  object 
if  they  are  in  continual  contact  and  one  point  of  one  object  contacts  different  points  of  the  other  object 
at  different  instants.  Although  the  notion  of  one  object  sliding  against  another  can  be  represented  in 
the  notation  used  here,  by  reducing  it  to  primitives  that  return  the  points  of  contact  between  objects,  I 
prefer  instead  to  treat  slide-against  as  a  primitive  notion  much  like  support,  contact,  and  attachment. 
I  conjecture  that  the  human  visual  apparatus  contains  innate  machinery  for  detecting  sliding  motion 
and  suggest  that  experiments  like  those  performed  by  Freyd  and  Spelke.  to  be  described  in  sections  8.3 
and  9.5,  could  be  used  to  determine  the  validity  of  this  claim.  Given  the  primitive  notion  slide-against. 
one  could  then  define  the  intransitive  verb  .slide  as  follows. 


(define  slide  (x)  (exists  (i  y)  (during  i  (slide-against  x  y)))) 

Rolling  motion  can  then  be  described  as  occurring  in  any  situation  where  an  object  is  rotating  while  it 
is  in  contact  with  another  object  without  sliding  against  that  object. 

(define  roll  (x) 

(exists  (i  y) 

(and  (during  i  (not  (slide-against  x  y))) 

(during  i  (rotate  x)) 

(during  i  (contacts  x  y))))) 

Accurately  representing  the  transitive  uses  of  slide  and  roll,  however,  requires  the  notion  of  causality. 
Since  this  thesis  does  not  offer  a  theory  for  grounding  the  perception  of  causality.  I  wdll  not  attempt 
to  formulate  definitions  for  these  transitive  uses.  It  is  interesting  to  note,  however,  that  despite  this 
inability  for  describing  causality,  many  verbs  described  so  far  are  nonetheless  causal  verbs.  They  can 
be  described  fairly  accurately  without  recourse  to  causality  due  to  the  availability  of  other  cues  such  as 
support,  contact,  and  attachment. 

So  far,  the  primary  use  of  the  notion  of  attachment  hcis  been  to  describe  grasping.  Levin  ( 1985.  1987) 
suggests  that  there  is  an  entire  class  of  verbs  of  attachment  including  attach,  fasten,  bolt.  glue.  nail. 

.staple .  I  want  to  suggest  another  potential  role  attachment  might  play  in  verb  meaning  beyond 

the  class  of  these  kind  of  attachment  verbs.  Two  other  verb  classes  suggested  by  Levin  include  verbs 
of  creation  and  verbs  of  destruction.  The  typical  way  of  representing  such  verbs  is  via  a  change  in  the 
state  of  existence  of  some  object.  Thus  Schank  proposes  the  following  definitions  for  make  and  break. 
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X  make  Z: 

X  break: 


X<^DO 

Ir 

Z<^BE 

X<=^DO 

l|r 

Z-«=>  broken 


To  the  same  end,  Jackendoff  proposes  the  existential  field  and  primitive.s  like  (JOexlsi  and  [EX].  Similarly. 
Pinker  offers  the  following  definitions  for  make  (p.  22;i)^ 


make; 


EVENT 


at  EXISTENCE 


and  break  (p.  206). 

break: 

EVENT 


ACT 

THING 

THING 

effect 

[] 

m 

EVENT;  ident 

g6^ 

Y  .;ioken" 


Like  uninterpreted  manner  attributes,  a  symbol  like  [EX]  offers  little  guidance  in  grounding  the  concepts 
of  creation  and  destruction.  While  1  do  not  suggest  that  we  are  anywhere  close  to  being  able  to  fully 
ground  these  concepts,  the  notion  of  attachment  may  allow  a  modest  start  in  the  right  direction.  Objects 
are  constructed  from  components  that  are  typically  attached  to  each  other  to  form  the  aggregate  parent 
object.  One  can  make  an  object  by  forming  attachments  between  appropriate  components.  One  can 
break  an  object  by  severing  those  attachments.  Chapter  8  describes  how  Abigail  models  objects  as 
collections  of  attached  line  segments  and  circles.  Attachments  between  line  segments  and  circles  can  be 
made  and  broken  during  the  course  of  the  movie.  Abigail  can  track  the  formation  and  dissolution  of 
attachment  relationships  dynamically  during  event  perception.  This  is  how  Abigail  can  detect  graspings 
and  releasings.  This  same  mechanism  can  be  used  to  determine  that  a  new  object  has  been  constructed 

have  omitted  the  benefactive  component  of  Pinker’s  original  definition  as  it  is  t^lngential  to  the  current  discussion. 
Pinker  also  phrteed  the  original  definition  as  a  gloss  for  the  utterance  ffoi  madt  a  kal.  I  have  replaced  the  tokens  (Bob) 
and  (hat)  from  the  original  gloss  with  the  vewiables  X  and  Y. 


out  of  some  coinponeuts.  or  that  an  object  has  been  broken  into  its  pieces.  Such  lou-level  notion>  iiia> 
form  the  basis  of  more  complete  explanations  for  creation  ami  tle.->trnctK)n  by  way  of  a  long  chain  of 
analogical  reasoning.  Whether  such  speculation  leads  anywhere  remains  for  future  rex^arch. 

As  a  final  examph',  1  will  present  the  definition  of  a  verb  that  is  .seemingly  perceptually  much  more 
complex.  Schank  gives  the  following  definition  for  walk. 


X  walk-  to  Z.  X-i=^PTRA.\S— X  ^ 


Z 

This  definition,  however,  admits  running,  hopping,  skipping,  jumping,  skating,  and  bicycling  events. 
We  can  consider  walking  to  involve  a  sequence  of  steps.  Each  step  involves  lifting  up  some  foot  off  the 
ground  and  placing  it  back  on  the  ground. 

(define  step  (x) 

(exists  (i  j  k  y) 

(and  (during  i  (contacts  y  ground)) 

(during  j  (not  (contacts  y  ground))) 

(during  k  (contacts  y  ground)) 

(equal  y  (foot  x)) 

(=  (end  i)  (beginning  j)) 

(=  (end  j)  (beginning  k))))) 

In  addition  to  stepping,  walking  involves  motion.  Furthermore,  two  conditions  can  be  added  to  distin¬ 
guish  walking  from  running,  hopping,  skipping,  and  jumping  on  one  hand,  and  skating  on  the  other. 
One  stipulates  that  at  all  times  during  walking,  at  lea.st  one  foot  must  be  on  the  ground.  The  second 
stipulates  that  no  sliding  takes  place. 

(define  walk  (x) 

(exists  (i) 

(and  (during  i  (repeat  (step  x))) 

(during  i  (move  x)) 

(during  i 
(exists  (y) 

(and  (equal  y  (foot  x)) 

(contacts  y  ground)))) 

(during  i 
(not  (exists  (y) 

(and  (equal  y  (foot  x)) 

(slide-against  y  ground)))))))) 

Taken  together,  this  is  a  fairly  accurate  description  of  walking. 

All  of  the  discussion  so  far  has  focussed  on  using  semantic  representations  for  event  perception.  The 
ultimate  goal  of  this  research,  however,  is  to  link  language  with  perception  using  the  architecture  from 
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figure  6. ‘2.  For  a  semantic  representation  to  act  as  an  ?.ppro|)riate  liriilge  hetween  the  linguistic  ami 
non-linguistic  halves  of  this  architecture,  it  must  simultaneously  meet  criteria  im(io.se(l  hy  hotli  liaKe.v 
Linguistic  processing  imposes  a  strong  constraint  not  aihlressed  .so  far.  It  iinisi  he  possible  to  ,sj)ecif\  a 
way  for  combining  representations  of  the  meanings  of  words  to  form  the  repre.sental ion  of  the  meaning 
of  an  utterance  cotnfirisitig  those  words.  .Such  a  process  iti  called  a  hnAj/tg  rule.  The  choice  of  linking 
rule  depends  on  the  represetitation  ii.sed,  A  linking  rule  appro|)riate  for  one  rejiresetitation  might  not 
be  suitable  for  another.  Jacketidoff,  Pinker,  and  Dorr  ( IWtOa.  UWOb)  adopt  a  snbstitution-ba.sed  linking 
rule.  With  this  rule,  word  meanings  are  taken  to  be  expressions  with  variables  acting  as  place  holders 
for  a  word's  arguments.  The  meaning  of  a  phrase  is  composed  by  taking  some  cotistituetit  in  that  phras<> 
as  the  head  and  substituting  the  meanings  of  the  remaining  constituents  for  variables  in  the  head  s 
tneaning.  Figure  7.2,  illustrates  an  example  applicatioti  of  this  linking  rule.  This  rule  can  be  thought 
of  simply  as  .^substitution,  otie  of  the  rewrite  rules  introduced  as  jiart  of  the  A-calcuius.  While  such 
a  linkitig  rule  is  suitable  for  Jacketidovian  representations  and  its  derivatives  used  by  Pinker  atid  Dorr, 
it  is  unsuitable  for  the  representation  proposed  liere.  Tliis  cati  be  illustrated  by  the  following  exatnple. 
Consider  the  utterance  Jolni  droppaf  th<  hook  on  tb(  floor.  For  sitnplicity,  let's  take  the  tneanitigs  of 
.lohn.  tin  book,  and  tin  floor  to  be  John,  book,  and  floor  respectively.  Pearlier.  1  took  the  mt'aning  of 
drop  to  be  as  follows. 

(define  drop  (x  y) 

(exists  (i  j) 

(juid  (during  i  (contacts  (hand  x)  y)) 

(during  i  (attached  (hand  x)  y)) 

(during  i  (supports  x  y)) 

(during  i  (supported  y)) 

(during  j  (not  (contacts  (hand  x)  y))) 

(during  j  (not  (attached  (hand  x)  y))) 

(during  j  (not  (supports  x  y))) 

(during  j  (not  (supported  y))) 

(during  j  (move-down  y)) 

(=  (end  i)  (beginning  j))))) 

While  one  could  apply  simple  substitution  to  link  John  with  r  and  book  with  y.  that  techni(|ue  will 
not  work  with  the  prepositional  phrase  on  tbi  floor  in  the  above  utterance.  The  desired  expression  to 
represent  the  meaning  of  the  entire  utterance  would  look  .something  like  the  following. 

(exists  (i  j  K) 

(and  (during  i  (contacts  (hand  John)  book)) 

(during  i  (attached  (hand  john)  book)) 

(during  i  (supports  john  book)) 

(during  i  (supported  book)) 

(during  j  (not  (contacts  (hand  john)  book))) 

(during  j  (not  (attached  (hand  john)  book))) 

(during  j  (not  (supports  john  book))) 

(during  j  (not  (supported  book))) 

(during  j  (move-down  book)) 

(DURIHG  K  (CONTACTS  BOOK  FLOOR)) 

(DURING  K  (SUPPORTS  FLOOR  BOOK)) 

(DURING  K  (SUPPORTED  BOOK)) 

(=  (end  i)  (beginning  j)) 

(=  (END  J)  (BEGINNING  K)))) 
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John  fil/d  thf  cup  from  Mari/  to  Bill. 

('AUSE(Johu,(l()(cup, [path  F ROM( Mary ).TO( Mary)])) 


slid  iht  cup  from  Mary  to  Bill 

CAUSE(r,GO(cup>  [pa.t,  FROM(Mary).TO(Mary)I)) 


CAUSE(j',GO(.v.  [path  u.i] 


from  Mary 

FROM(Mary ) 


from  Mary 

FROM(x)  Mary 


to  Bill 
TO(BiU) 


1o  Bill 
TO(j-)  Bill 


Figure  7.2:  A  derivation  of  the  meaning  of  the  utterance  John  slid  tht  cup  from  Mary  to  Bill  from 
the  meaning^:  of  its  constituent  words  using  the  linking  rule  proposed  by  .lackendoff. 


While  it  is  unclear  what  to  take  preci.sely  as  the  meaning  of  the  preposition  on.  what  it  does  structurally 
in  the  above  example  is  contribute  a  new  interval  k  to  the  existential  quantifier,  some  added  conjuiicts 
describing  support  and  contact  relationships  between  the  book  and  the  floor,  and  an  added  conjunct 
to  temporally  constrain  the  new  interval  relative  to  prior  intervals.  These  additions  appear  in  upper 
ceise  in  the  above  semantic  representation.  Whatever  we  take  as  the  meaning  of  on  lh(  floor,  it  is  not 
a  piece  of  structure  that  is  substituted  for  a  single  variable  in  some  other  structure.  Furthermore,  the 
new  structure  contributed  by  on  the  floor  must  itself  have  variables  which  are  linked  to  elements  such 
as  book  from  the  structure  to  which  it  is  linked.  Thus  substitution-based  linking  rules  are  not  suitable 
for  the  type  of  representation  discussed  here. 

There  is  much  talk  in  the  linguistic  literature  about  linking  rules  which  are  claimed  to  be  innate  and 
universal  (cf.  Pinker  1989).  Such  claims  can  be  valid  only  if  the  actual  .semantic  representation  used  by 
the  brain  is  of  the  form  that  allows  such  linking  rules  to  apply.  These  claims  must  be  re',  ised  if  it  turns  out 
that  the  semantic  representation  must  be  more  like  that  discussed  here.  Consider  the  following  example. 
A  common  claim  is  that  a  universal  linking  rule  stipulates  that  agents  are  subjects.  An  additional 
claim  is  that  the  first  argument  to  the  CAUSE  primitive  is  an  agent  (cf.  Jackendoff  1990).  Using 
extensions  that  will  be  described  in  section  10.2,  the  primitive  notion  (supports  x  y)  can  be  viewed 
as  something  like  (cause  x  (supported  y)).  In  this  case,  x  would  be  an  agent  and  thus  would  be  a 
subject.  Consider  however,  the  utterance  John  leaned  on  the  pole.  In  the  representation  considered  here, 
this  would  correspond  to  (supports  pole  john),  or  equivalently  (cause  pole  (supported  John)). 
This  would  require  pole  to  be  an  agent  and  thus  a  subject,  contrary  to  English  usage.  Thus  the  claimed 
universal  linking  rule  and  the  semantic  representation  considered  here  are  incompatible.  The  universal 
linking  rule  can  be  valid  only  if  we  find  a  compatible  representation  w  hich  also  allows  grounding  meaning 
in  perception. 

Borchardt  (1984)  recognizes  the  need  to  incorporate  the  notions  of  support,  contact,  and  attachment 
into  procedures  for  recognizing  simple  spatial  motion  events.  He  describes  a  system  that  recognizes  such 
events  in  a  simulated  micro-world  containing  a  robot  hand  and  several  objects.  That  system  receives  the 
changing  coordinates  of  those  objects  as  input.  Figure  7.3  illustrates  several  event  recognition  procedures 
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suggested  by  Borchardt  for  that  micro-world.  While  his  definitions  and  notation  differ  in  s|>ecific  deiail.-. 
from  the  definitions  and  notation  sugge.sted  here,  we  share  the  .same  intent  of  ilescribing  spatial  motion 
events  using  the  notions  of  support,  contact,  and  attachment.  1  he  major  difference  is  that  Borchardt  s 
system  receives  the  changing  support .  contact,  and  attachment  relationships  between  olijects  as  input, 
while  Abigail  infers  such  relationshiiis  from  lower-level  perceptual  input. 

To  summarize,  this  chapter  has  advanced  tlie  claim  that  the  notions  of  support,  coniaci.  ami  attach- 
tiient  play  a  central  role  in  defining  the  meanings  of  numerous  simple  sjiatial  motion  verbs.  1  hese  notions 
are  necessary  to  construct  procedures  which  can  differentiate  iietween  occurrences  and  non-occurrences 
of  prototypical  events  which  these  verlts  describe.  1  have  shown  how  prior  lexical  semantic  represen¬ 
tations  lack  the  ability  for  representing  these  notions,  and  are  thus  incapable  of  tiiaking  tin'  reipiisite 
distinctions.  Furthermore,  I  have  propo.sed  an  alternate  representation  which  not  only  incor|>orate.s  tlie.se 
notions  into  verb  definitions,  but  does  so  in  a  prominent  fa.shion.  This  new  representation  is  tiseful  only 
if  one  can  show  how  to  ground  the  tiotions  of  support,  contact,  and  attachment  in  visual  iierception. 
The  next  two  chapters  will  propose  a  theory  of  how  such  grounding  may  work. 


(defun  slide  (a  b) 

(and  (dsupport  table  a) 

(translate  a) 

(not  (roll  a)))) 

p.  <>9 

(defun  roll  (a  b) 

(and  (dsupport  a) 

(dsupport  a) 

(translate  a) 

(or  (isa  a  ball) 

(and  (isa  a  cylinder) 

(perpendiculcir  i  (heading  a  i)  (orientation  a  p  i)))))) 

p.  W 

(defun  fall  (a) 

(and  (<  (ddt  (position  a  z))  -10) 

(not  (exists  i  hand  (control  i  a))))) 

p.  .99 

(defun  bounce  (a  b) 

(and  (aoveasay  a  b) 

(hit  a  b  justbefore  (start  (noveaeay  a  b))) 

(<  (abs  (ddt  (velocity  b)))  3))) 

p.  lO.'i 

(defvm  control  (a  b) 

(and  (not  (dsupport  table  b)) 

(or  (hold  a  b) 

(support  a  b) 

(exists  i  object  (and  (hold  a  i)  (support  i  b)))))) 

p.  lOs 

(defun  raise  (a  b) 

(and  (control  a  b)  (<  (ddt  (position  b  z))  -0.5))) 

p.  lOff 

(defun  pickup  (a  b) 

(and  (aovefingers  a) 

(not  (control  a  b)) 

(at  (ever  (control  a  b) 

(start  (and  (movefingers  a)  (not  (control  a  b))))) 

(next  (stop  (novefingers  a)))))) 

p.  IK) 

(defun  setdonn  (a  b) 

(and  (movefingers  a) 

(control  a  b) 

(at  (ever  (not  (control  a  b)) 

(start  (and  (movefingers  a)  (control  a  b)))) 

(next  (stop  (movefingers  a)))))) 

p.  no 

(defun  drop  (a  b) 

(and  (fall  b)  (justbefore  (control  a  b)  (start  (fall  b))))) 

p.  no 

Figure  7.3;  A  selection  of  representations  of  verbs  used  by  Borchardt  to  detect  occurrences  of  events 
described  by  those  verbs  in  a  simulated  blocks  world  with  a  robot  arm.  The  page  numbers  indicate 
where  the  representation  appeared  in  Borchardt  (1984). 
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Chapter  8 

Event  Perception 


In  cliaptt'r  7.  1  argued  that  tlie  notions  of  support,  contact,  and  attachment  play  a  central  role  in  defining 
the  meanings  of  numerous  spatial  motion  verbs.  If  this  is  true,  the  al)ilily  to  perceive  occurreiices  of 
events  described  by  those  verbs  rests  on  the  ability  to  perceive  these  support,  contact,  and  attachment 
relations.  In  this  chapter  1  advance  a  theory  of  how  tliis  might  be  accomplished.  The  central  claim  of 
this  chapter  is  that  support,  contact,  and  attachment  relations  can  be  recovered  u.sing  coiintprfartuHl 
simulation,  imagining  the  sliort-term  future  of  a  potentially  modified  image  under  the  effects  of  gravity 
and  other  pltysical  forces.  For  instance,  one  determines  that  an  object  is  unsupported  if  one  imagines 
it  falling.  Likewise,  one  determines  that  an  object  .4  supports  an  object  D  if  B  is  supported,  but 
falls  when  one  imagines  a  world  without  .4.  An  object  .4  is  attached  to  another  object  B  if  one  must 
hypothesize  such  an  attachment  to  explain  the  fact  that  one  object  supports  the  ot  her.  A  similar,  though 
slightly  more  complex,  mechanism  is  used  to  delect  contact  relationships,  .All  of  the  mechanisms  rely 
on  a  modular  imagination  capacity.  This  capacity  takes  the  representation  of  a  possibly  modifu-d  image 
as  input,  and  predicts  the  short-term  consequences  of  such  modifications,  determining  whether  some 
predicate  P  holds  in  any  of  the  series  of  images  depicting  the  short-term  future,  The  imagination  capacity 
is  modular  in  the  sense  that  the  same  unaltered  mechanism  is  used  for  a  variety  of  purposes,  varying 
only  the  predicate  P  and  the  initial  image  model  between  calls.  To  predict  the  future,  the  imagination 
capacity  embodies  physical  knowledge  of  how  objects  behave  under  the  influence  of  physical  forces  such 
as  gravity.  For  reasons  to  be  discussed  in  chapter  9,  such  knowledge  is  naive  and  yields  jiredictions  that 
differ  substantially  from  those  that  accurate  physical  modeling  would  produce.  Section  10. ’2  speculates 
about  how  the  imagination  capacity  might  also  contain  naive  p.sycbological  knowledge  modeling  the 
mental  state  of  agents  in  the  world,  and  how  such  knowledge  might  form  the  basis  of  the  perception 
of  causality.  Chapter  9  discusses  the  details  of  the  mechanism  behind  the  imagination  capacity.  I  his 
chapter  first  presents  a  computational  mode!  of  how  such  a  capacity  can  be  u.sed  to  perceive  .sup|)ort. 
contact,  and  attachment  relations,  as  well  as  experimental  evidence  that  suggests  that  such  mechanisms 
might  form  the  basis  of  human  perception  of  these  notions. 

Certain  notions  seem  to  pervade  human  perception  of  the  world.  4Ve  know  that  .solid  objects  cannot 
pass  through  one  another.  This  has  been  termed  the  suhstantiality  constraint .  We  know  that  objects  do 
not  disappear  and  then  later  reappear  elsewhere.  When  an  object  moves  from  one  location  to  another,  it 
follows  a  continuous  path  between  those  two  locations.  This  has  been  termed  the  continuity  constraint. 
We  know  that  unsupported  objects  fall  and  that  the  ground  acts  as  universal  support  for  all  objects.  1 
will  refer  to  these  latter  two  facets  of  human  perception  as  gravity  and  ground  plane.  Section  9  5  will 
review  experiments  performed  by  Spelke  (1988)  and  her  colleagues  that  give  evidence  that  at  least  two  of 
the  above  notions  are  present  in  humans  from  very  early  infancy,  namely  substantiality  and  continuity. 
This  chapter,  along  with  chapter  9,  argues  that  substantiality,  continuity,  gravity,  and  ground  plane  are 
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ff'iilral  notions  that  govern  tlie  oj)eraiion  of  an  imagination  rapacity  wliich  is  used  to  recover  suppi)rt. 
contact,  and  attachment  relations  from  visual  input.  Recovery  of  these  relations  in  turn,  forms  the  basis 
of  t'vent  perception  and  the  grounding  of  language  in  perception. 


8.1  The  Ontology  of  Abigail’s  Micro- World 

Before  presenting  the  details  of  a  computational  model  of  event  jierception.  it  is  necessar>  to  describe 
the  ontology  which  .\BKiAIL  uses  to  ititerpret  the  images  she  is  given  as  input. 

The  real  world  behaves  according  to  the  laws  (T  physics.  Beyond  these  laws,  pisiple  jiroject  an 
ontology  onto  the  world.  It  may  be  a  matt<'r  of  ilebate  as  to  which  facets  of  our  perceived  world  should 
be  attributed  to  physics,  and  whiclt  to  our  conceptualization  of  it,  but  such  philosophical  ipiestions 
do  not  concern  us  liere.  In  either  cas<',  our  world  contains,  among  other  things,  .solid  objects  1  hese 
objects  have  ma.ss.  They  are  located  and  oriented  in  three-dimensional  cartesian  space  Solid  objects 
obey  the  principles  of  substantiality,  continuity,  gravity,  and  ground  plane,  that  is.  solid  objects  do 
not  pa-ss  through  one  another,  they  follow  a  continuous  patli  through  space  when  moving  betwi'en  two 
points,  they  fall  unless  they  are  supported,  and  they  are  universally  supported  by  the  ground.  Subj<'ct 
to  these  constraints  (and  perhaps  others),  solid  objects  can  cliange  their  position  ainl  orientation,  they 
can  touch  one  another,  they  can  be  fastened  to  one  another,  they  ran  l>e  l)roken  into  pieces,  and  those 
pieces  eventually  refastened  to  form  eitlier  tlie  same  object,  or  different  obji'cts.  ( 'omple.x  objects  can 
be  constructed  out  of  parts  which  have  been  fastened  togetlier.  Tlte  relative  motion  of  such  parts  can 
be  constrained  to  greater  or  le.s.ser  degrees. 

The  aforementioned  story  is  a  small  but  important  fragment  of  human  world  ontology.  On  this  view, 
we  all  share  roughly  the  same  conceptual  framework,  around  which  much  of  language  is  structured.  The 
non*metaphoric  tneanings  of  many  siin|)le  spatial  motion  verbs  deitend  on  this  siiared  ontology.  For 
example,  the  verb  tut  incorporates,  among  other  things.  th<>  notion  of  support,  which  in  turn  is  built 
on  the  notions  of  gravity  and  substantiality.  But  this  alone  does  not  suffic('.  Sit  also  incor|)orates  the 
notion  that  our  body  has  limbs  as  parts,  that  the.se  limbs  are  joined  to  our  tor.so.  that  these  joints  impose 
certain  constraints  on  the  relative  motion  of  our  body  parts,  and  these'  constraints  allow  us  to  assume 
certain  postures  which  facilitate  the  support  of  our  bo<ly.  Furthermore,  many  nouns  such  as  (huir  derive' 
at  least  part  of  their  meaning  from  the  role  the'y  play  in  events  referreel  to  by  worels  like'  ,si/.  So  a  chair 
must  facilitate  support  of  the  body  in  the  sitting  posture.  A  little  introsi>e'rtion  will  reveal  that  the' 
aforementioned  fragment  is  a  necessary,  and  perhaps  almost  sufficient,  ontology  for  de'scribing  numerous 
word  meanings,  including  those  discussed  in  chapter  7. 

Like  the  real  world.  .ABKiAIL's  microworld  has  an  ontology,  though  this  ontology  is  derived  mostly 
via  projection  of  Abicjail's  perceptual  processes  onto  a  world  governe'd  by  very  few  ])hysical  laws.  This 
ontology  is  analogous  to  that  of  the  real  world  though  it  differs  in  .some  of  the  details.  .\bic; Alt's  micro¬ 
world  contains  objects  that  have  ma.ss,  and  are  located  and  oriented  in  a  2^-dimensional  cartesian  space. 
These  objects  obey  substantiality,  continuity,  gravity,  and  ground  |)lane.  They  can  move,  touch,  support, 
and  be  fastened  to  one  another.  They  can  break  into  pieces  and  those  pieces  refastened.  The  relative 
motion  of  pieces  fastened  together  can  be  constrained  so  that  an  object  constructed  out  of  parts  can 
have  a  posture  which  can  potentially  change  over  time.  Most  of  the  words  discussed  in  chapter  7  can  bi' 
interpreted  relative  to  the  alternate  ontology  of  Abk:aII,'s  micro-world,  rather  than  the  real  world.  Such 
a  re-interpretation  maintains  the  general  conceptual  organization  of  the  lexicon  in  that  a  person  would 
use  the  same  word  sit  to  describe  analogous  events  in  the  movie  and  the  real  word.  Furthermore,  the 
ontological  analysis  projected  by  .ABiCiAlL  onto  a  sitting  event  in  the  movie  is  identical  to  the  analysis 
projected  by  a  person  watching  a  sitting  event  in  the  real  world,  even  though  the  low-level  primitives  out 
of  which  those  analyses  are  constructed  differ.  This  allows  Abigaii.'s  micro-world  to  act  a  simplified 
though  non-trivial  testbed  for  exploring  the  relation.ship  between  language  and  perception. 
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The  aforenieiitioned  ontology  is  not  implemented  in  AhiciAIL  as  I'Xplicit  declarative  kinnvh  dge.  In¬ 
stead,  it  i.s  embedded  procedurally  in  an  itnan'iiiation  capHcity  to  he  described  in  chapter  9.  The  event 
perception  mechanisms  (.lescribed  in  this  cliapter,  and  ultimately  any  language  processing  component 
which  these  mechanisms  drive,  rely  on  this  ontology  titrungh  the  imagination  capacity.  Although  the  on¬ 
tology  possessed  by  humans  differs  from  this  artificial  ontology  in  its  iletails,  if  the  gi'iieral  framework  for 
event  perception  incor|)orated  into  .Abigaii,  is  reflective  of  act nal  human  event  perce|ition,  then  human 
event  perception  loo  must  ultimately  rely  on  a  world  ontology.  1  slionld  stress  that  1  remain  agnostic  on 
the  issue  of  whether  such  an  ontology — and  the  mechanisms  for  its  use  are  innate  or  acipiired.  Nothing 
in  this  thesis  depends  on  the  outcome  of  that  debate.  All  that  is  a-ssnined  is  that  the  ontology  and 
mechanisms  for  its  use  are  in  place  prior  to  the  onset  of  any  linguistic  ability  baseil  on  the  link  between 
linguistic  and  perceptual  processes.  A  particular  consequence  of  this  assumption  is  the  requirement 
that  the  ontology  and  mechanisms  for  its  u.se  be  in  place  j>rior  to  the  onset  of  language  acquisition, 
since  the  models  described  in  part  1  of  this  thesis  rely  on  associating  each  input  utterance  with  semantic 
information  denoting  the  potential  meanings  of  that  utterance  recovered  from  the  non-linguist  ic  context. 

This  ontology  may  be  represented  redundantly,  and  differently,  at  multi|>l<'  cognitive  |e^'els,  I  limi 
no  reason  to  assume  that  this  ontology  is  represented  uniformly  in  the  brain  at  a  singh'  cognitive  level. 
The  representation  used  for  imagination,  a  low-level  process,  might  differ  from  representations  at  higher 
levels.  The  ontology  used  for  low-level  imagination  during  visual  i)erception  may  differ  both  in  its 
implementation,  as  well  as  its  predictive  force,  from  any  other  ontology  we  possess,  in  particular  that 
which  we  discover  through  introspection.  Different  ontologies  may  be  acquired  via  different  means  at 
different  times.  Furthermore,  it  is  plausible  for  some  to  be  innate  while  others  are  ac<)uired.  To  me.  in 
fact ,  this  seems  to  be  the  most  likely  scenario. 


8.1.1  Figures 


At  the  lowest  level,  the  world  that  Abigail  perceives  is  constructed  from  figures.  1  will  denote  figures 
with  the  (possibly  subscripted)  symbols  /  and  g.  In  the  current  implementation,  figures  have  one  of  two 
•shapes,  namely  fine  segments  and  circles.  Conceivably.  .Abigail  could  be  extended  to  sup))ort  additional 
shapes,  such  as  conic  section  arcs  and  polynomial  arcs,  though  the  complexity  of  the  implementation 
would  grow  substantially  without  increasing  the  conceptual  coverage  of  the  theory.' 

At  each  movie  frame  .Abigail  is  provided  with  the  position,  orientation,  shape,  and  size  of  every 
figure.  Positions  are  points  in  the  cartesian  plane  of  the  movie  screen.  I  assume  that  the  camera  does 
not  move.  Thus  an  object  is  stationary  if  and  only  if  the  coordinates  of  the  positions  of  its  figures 
do  not  change.  The  (possibly  subscripted)  symbols  p  and  </  will  denote  points.  Each  point  p  has  two 
coordinates,  i’(p)  and  i/(p). 

The  position  of  a  figure  /  is  specified  by  two  points.  p(/)  and  g(f).  For  line  segments,  these  are  its 
two  endpoints.  For  circles.  p(/)  is  its  center  while  q{f)  is  a  point  on  its  perimeter.  The  orientation  and 
size  of  figures  are  derived  from  these  points.  Given  two  points,  p  and  </.  the  orientation  of  the  line  from  p 
to  q  in  given  by* 


0(p,  q]  =  tan 


.v(?)  -  .v(p) 

x(q)  -  j  (p)' 


The  orientation  of  a  figure,  whether  it  be  a  line  segment  or  a  circle,  is  an  angle  0(f)  =  0(f).^  Throughout 
the  implementation  of  .Abigail,  all  angles  0.  including  the  orientations  of  figures,  are  normalized  so 

*  In  retrospect,  even  allowing  circles  unduly  complicated  the  implementation  eflr.,rt.  Little  would  be  lost  by  allowing 
only  line  segments,  and  modeling  circles  as  polygons. 

^Actually,  the  Cfommon  Lisfnnction  (atan  (-  (y  q)  (y  p))  (-  (x  q)  (x  p)))  is  used  to  handle  orientation  in  all 
four  quadrants  and  the  case  where  6  is  j . 

’This  implies  the  somewhat  iiu'ealistic  assumption  that  circles  have  a  perceivable  orientation.  The  reason  for  this 
simpliHcation  will  be  discussed  on  page  12T. 
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that  — 7r  <  0  <  TT.  Nott*  lliat  tlie  leftward  orientation  is  nornializeil  to  and  not  — tt.  I'lie  reason 
for  this  will  be  discussed  on  page  Kir).  Axes  of  translation  will  l>e  specified  as  orientations,  (liven  the 
orientation  0  of  an  axis  of  translation,  translation  along  the  axis  in  the  op|)osite  direction  is  accoinjilished 
via  a  translation  with  the  orientation  (J+t,  suitably  nornialized.  In  a  similar  fashion,  amonnlsof  rotation 
about  pivot  points  will  be  specified  via  angles.  If  ^  denotes  an  amount  of  rotation  in  one  direction  then  —0 
denotes  the  amount  of  rotation  in  the  opposite  tlirection. 

I  will  denote  the  distance  between  two  points  //  and  </  as  A(//.  7). 

A(p.</)  =  -  Hq))-  +  (yfp)  - 


The  size  of  a  line  segment  is  its  length,  the  distance  \(l>(f).q(f))  between  its  two  endpoints.  The  size 
of  a  circle  is  its  perimeter;  ?rA(p(/).  </(/))'.  Figures  also  have  a  mass,  denoted  m(/).  which  is  taken  to 
be  equal  to  their  size.  Figures  have  a  ceiitcr-of-mass.  The  values  J‘{f)  and  t/(f)  denote  the  coordinates 
of  the  cell ter-of- mass  of  a  figure  /.  'I'he  center-of-mass  of  a  line  segment  is  its  midi)oint. 


Af) 

.«/(/) 


■»-(/>(/))  +  •'•(</(/)) 
2 

.>/(/>(/))  +  .t/(v(/)) 
2 


The  center-of-mass  of  a  circle  is  its  center:  x{f)  =  x{p{f)).  y{f)  =  y{}>{f)). 

1  also  define  the  notion  of  the  displacement  between  a  point  and  a  figure,  denoted  f>(p.f).  This  will 
play  a  role  in  defining  joint  parameters  in  the  next  section.  If  /  is  a  line  segment,  then 


HpJ)  = 


Mi>-p(f)) 


Such  a  displacement  is  called  a  translational  displacement.  Since  displacements  are  used  only  for  points 
forming  joints  between  figures,  the  point  p  will  always  lie  on  /  and  the  displacement  will  always  be 

between  zero  and  one  inclusively.  If  /  is  a  circle,  then  ^(p.f)  =  dipifl-P)  —  0{f)-  Such  a  dis()lacement 
is  called  an  rotational  displacement  and  w’ill  always  be  normalized  so  that  —z  <  t>(p.  f)  <  j. 


8.1.2  Limitations  and  Simplifying  Assumptions 

At  every  movie  frame,  Abigail  is  presented  with  a  set  Ti  of  figures  that  appear  in  frame  /.  .Several 
.simplifying  assumptions  are  made  with  respect  to  the  sets  Ti- 

1.  Each  figure  in  every  frame  corresponds  to  exactly  one  figure  in  both  the  preceding  and  following 
frame. 

2.  Abigail  is  given  tliis  correspondence. 

3.  The  shape  of  each  corresponding  figure  does  not  change  from  frame  to  frame. 

4.  Abigail  is  given  the  correspondence  between  the  endpoints  of  corresponding  line  .segments  in  suc¬ 
cessive  frames.  In  other  words,  Abigail  is  given  the  distinction  between  a  line  segment  whose 
endpoints  are  {p.q)  and  one  who.se  endpoints  are  (q.p).  This  allows  Abigail  to  assign  an  unam¬ 
biguous  orientation  to  every  line  segment, 

.5.  Abigail  can  perceive  two  concentric  equiradial  circles  as  .separate  figures  even  though  they  overlap. 
Abigail  can  also  perceive  two  collinear  intersecting  line  segments  as  separate  figures.  This  means, 
for  instance,  that  when  a  knee  is  straightened  so  that  the  thigh  and  calf  are  collinear.  they  are  still 
perceived  by  ABIGAIL  as  distinct  line  segments  even  though  they  may  be  depicted  graphically  as 
a  single  line  segment . 
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Collect ively,  these  simplifying  assumptions'  imply  that  ABKiAlL  need  only  maintain  a  singh-  vt  T  of 
figures  invariant  over  time.  Only  the  coordinates  of  the  poiitts  of  the  figures  can  change  from  frame 
to  frame.  Thest'  assumptions  also  imply  several  restrictions  on  s  onti.)logy.  First,  individual 

figures  are  never  createcl,  destroyed,  split,  fused,  or  hent.  I'his  is  not  a  s<>vere  restriction  since  figures 
are  oidy  the  atomic  elements  out  of  which  objects  are  constructed.  Objects,  being  sets  of  figures,  can 
nonethele.ss  be  created,  destroyed,  split,  fu.sed.  or  bent  by  changing  the  attachment  relationships  between 
the  figures  constituting  those  objects.  Second,  figures  cannot  appear  or  disap|)ear.  Tht'y  can  never  <'nter 
or  leave  the  field  of  view  and  are  never  occluded.  Since  objects  are  compo.sed  of  figures,  this  implies  that 
objects,  as  well,  tiever  enter  or  leave  the  field  of  view.  While  from  a  very  earlv  age.  infants  po.s.se.ss  the 
notion  of  object  iiermanence,  such  a  notion  has  not  yet  been  incorporated  into  Abigail.  This  s<‘vert‘ 
restrictioti  will  not  be  addressed  iti  this  thesis.  Finally,  these  assumptions  imply  that  Abigail  is  given 
the  continued  identity  of  objects  over  time. 

Object  perception  can  be  broken  down  into  three  distinct  tasks:  segmentation,  cla.ssification.  and 
identification.  .Segmentation  is  the  process  of  gronpitig  figures  together  into  objects.  Clivisification  is 
the  process  of  assigning  a  type  to  an  object  based  on  its  relation  to  similar  objects.  Idoiilificatioii  is 
the  proce.ss  of  tracking  the  identity  of  an  object — determining  that  some  object  is  the  same  as  one 
previously  seen.  This  thesis  currently  addresses  only  segmentation.  The  per-frame  analysis  discus.sed  in 
.section  8.2.1  is  a  novel  approach  to  image  segmentation  based  on  naive  physical  knowledge.  Kxtending 
this  approach  to  address  object  classification  and  identification  is  an  area  left  for  future  re.search. 

It  is  possible  to  relax  the  assumptions  that  .Abigail  be  provided  with  the  figure  and  endpoint 
correspondences  (assumptions  2  attd  4  from  above),  and  have  her  recover  such  correspondences  herself, 
provided  that  such  correspotidences  do  exist  to  be  recovered  atid  the  remaining  assumi)tions  still  hold. 
One  way  to  extend  Abigail  to  recover  the  figure  and  endpoint  correspondences  would  be  to  choose  a 
matching  that  paired  only  objects  of  the  same  shape,  and  choose  the  matching  that  minimized  the  sutn 
of  the  distances  between  the  points  of  the  paired  figures.  If  the  frame  rate  is  high  enough  relative  to 
object  velocities,  a  simple  greedy  optimization  algorithm,  perhaps  with  some  hillclimbing,  should  suffice. 
This  approach  would  be  a  simple  first  step  at  addressing  object  identification.  It  has  not  been  attemiited 
since  it  is  tangential  to  the  main  focus  of  this  work. 

Many  of  Abigail's  perceptual  mechanisms  are  phrased  in  terms  of  the  notions  inter.secl,  touch,  and 
overlap.  Two  figures  intersect  if  they  share  a  common  point.  Two  line  segments  touch  if  they  intersect 
at  a  single  poitit  and  that  intersection  point  is  coincident  wdtli  an  endpoint  of  one  of  the  line  segments. 
Two  circles  touch  if  they  intersect  at  a  single  point.  A  line  segment  and  circle  touch  either  if  the  line 
.segment  is  tangent  to  the  circle,  or  one  of  the  two  possible  intersection  points  is  coincident  with  an 
endpoint  of  the  line  segment.  Two  figures  overlap  if  they  intersect  but  do  not  touch,  except  that  a 
line  segment  and  a  circle  can  both  overlap  and  touch  if  one  intersection  point  is  coiticident  with  an 
endpoint  of  the  line  segment  w'hile  the  other  is  not.  Figure  8.1  gives  a  pictorial  depiction  of  these  notions 
and  enumerates  the  different  possible  relations  between  two  figures.  The  left  hand  column  depicts  the 
possible  relations  betw'een  two  line  segments.  The  center  column  depicts  the  possible  relatiotis  between 
a  line  segment  and  a  circle.  The  right  hand  column  depicts  the  po.ssible  relations  between  two  circles. 
C'ases  (a)  through  (h)  depict  touching  relations,  (.’a-ses  (i)  through  (k)  depict  overlap  relations.  ('a.se  (1) 
depicts  the  otily  instance  where  two  figures  can  both  touch  and  overlap  simultatieously. 

For  reasons  which  will  be  discussed  is  section  9.3.4.  these  notiotis  of  intersect,  touch,  and  overlap 
must  be  made  'fuzzy'.  In  this  fuzzy  definition  of  intersection,  two  figures  intersect  if  the  closest  distance 
between  a  point  on  one  and  a  point  on  the  other  is  within  some  tolerance.  The  midpoint  between  those 
two  closest  points  is  taken  to  be  the  intersection  point  for  determining  the  touch  and  overlap  relations 
if  the  two  figures  do  not  actually  intersect.  Finally,  two  points  are  taken  to  be  coincident  if  the  distance 
between  them  is  within  some  tolerance. 

^  For  efficiency  rea.sons.  the  current  implementation  of  Abigail  adds  the  additional  assiunption  that  the  size  of  corre- 
sponding  figures  is  invariant  across  frames  though  this  assumption  is  not  fundamental  and  easily  lifted. 
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These  are  not  meant  to  he  taken  as  (.iefinilioiis  of  the  words  int>ts(ci.  touch,  and  onrlaii.  Rather 
they  are  low-level  perceptual  notions  out  of  which  higher-level  definitions  of  these  words,  and  others, 
can  be  constructeil. 


8.1.3  Joints 

Part  of  Abigail’s  ontology  is  the  knowledge  that  figures  can  he  joined,  fastened,  or  attachetl  together, 
A  joint  is  a  constraint  that  two  figures  intersect,  I  will  denote  a  joint  with  the  (po.ssihly  subscripted) 
symbol  j.  The  two  figures  joined  by  a  joint  j  are  denoted  f(j)  and  ij(j). 

Joints  can  optionally  further  constrain  the  relative  motion  between  two  figures.  Since  each  figure  has 
three  degrees  of  freedom  (the  (j-.  y)  position  of  one  endpoint  and  its  orientation),  a  joint  can  potentially 
constraint  each  of  these  three  degrees  of  freedom  of  one  figure  relative  to  another  it  is  joined  to.  Thus 
a  joint  may  specify  three  parameters,  each  of  which  independently  constrains  one  of  the  degrees  of 
freedom.  Each  of  these  parameters  may  he  either  real-valued  or  nil.  A  nil  value  for  a  parameter 
signifies  that  a  joint  is  flexible  along  that  degree  of  freedom,  while  a  real  value  s|)ecifies  that  it  is 
rigid.  Joints  can  be  independently  rigid  or  flexible  along  each  degree  of  freedom.  A  rigid  rotation 
parameter  0(j)  constrains  the  angle  between  the  orientations  of  the  two  joined  figures  to  be  equal  to  the 
parameter  setting:  0(j)  =  0(g(j))  —  6(f{j))-  The  remaining  two  joint  parameters  are  the  displacement 
parameters  and  6g(j)  w'hich  partially  constrain  the  displacement  of  the  intersection  point  relative 

to  each  figure.  Since  the  two  figures  of  a  joint  must  intersect,  one  can  denote  their  intersection  |)oint 
as  p{J).  If  hf(j)  is  rigid  then  the  constraint  hfij)  =  f(j))  is  enforced.  Likewi.se.  if  hg{J)  is 

rigid  then  the  constraint  f>g(j)  =  Hp(j)<9(j))  is  enforced.®  Note  that  giving  circles  orientations  allows 
defining  the  concept  of  rotational  displacement.  Without  such  a  concept,  fixing  the  relative  positions  of 
two  joints,  each  joining  a  different  line  segment  to  the  same  circle,  would  require  a  complex  constraint 
specification  between  all  three  figures.  With  the  notion  of  rotational  displacement,  the  displacement  of 
each  line  .segment  relative  to  the  circle  can  be  fixed  independently  a.s  a  constraint  between  two  figures. 

Since  two  figures  may  have  more  than  one  intersection  point.  I  add  an  additional  simplifying  as¬ 
sumption  about  joints  to  allow  unambiguous  determination  of  the  intersection  point  pij)-  I  require 
that  at  least  one  of  the  displacement  parameters  of  each  joint  be  rigid.  Subject  to  this  constraint,  the 
intersection  point  can  be  found  by  using  whichever  of  the  following  formulas  is  applicable.  If  hf{j)  is 
rigid  and  f{j)  is  a  line  segment  then 

■r(p(j))  =  ■'■(p(/(j)))  +  ^/(»  X  (j-(9(/(;)))  -  j-(p(/(j)))) 

y(p{j})  =  yipifij)))  +  ^fiJ)  X  (y{q(f(J)))  -  yipifiJ))})- 

If  6j{j)  is  rigid  and  /(j)  is  a  circle  then 

•c(p(j))  =  •r(p(/(j)))  + A(p(/(j)),7(/(j)))cos(iS;(j)-|-P(/(j))) 

yiP(j))  =  y(p(f{j)))  +  A(p(/(j)).7(/(j)))sin((S;(j)  +  0(f(j))). 

If  ^g{j)  is  rigid  and  g(j)  is  a  line  segment  then 

•r(p(j))  =  ■r(p(p(j)))  +  X  (j-(9(p(j)))  -  rlpiyiJ)))} 

y{p{j))  =  p(p(.9(j)))  +  X  {y{y(g(j)))  -  y(p(g(j))))- 

^Due  to  roundoff  problems,  a  fuzzy  notion  of  equality  must  be  used  to  enforce  joint  paramet  ers.  The  fuzzy  comparison 
of  angles  must  take  normalization  into  account.  This  requires  equating  —  r  -f  t  to  -  —  e. 
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If  6g(j)  is  rigid  and  (/(J)  is  a  circle  then 

=  J-lpiyij)))  +  A{p{!/(J)).<f{y(j)))cos(8g(j)  + 

y{p(j))  =  yipiyU)))  +  Mp{y{j))^y(y(j)))^^M^gU)  +  (^(yU)))- 

As  part  of  her  pre-linguistic  endownienl  Abigail  knows  that  figures  can  he  fastened  by  joints  and 
that  joints  have  the  aforementioned  properties.  Furthermore,  she  knows  how  these  properties  affect 
the  motion  of  joined  figures  under  the  effects  of  gravity  and  related  naive  physical  constraints.  This 
knowledge  is  embodied  in  an  imagination  capacity  which  will  be  discus.sed  in  chapter  9.  However,  her 
perceptual  processes  do  not  allow  her  to  directly  perceive  the  existence  of  joints  in  the  movie  she  is 
watching.  As  perceptual  input,  she  is  given  only  the  positions,  orientations,  shapes,  and  sizes  of  figures 
in  each  movie  frame.  She  is  not  told  which  figures  are  joined  and  how  they  are  joined.  She  must  infer  this 
information  from  the  image  figure  data  alone.  Furthermore,  which  figures  are  joined  and  the  parameters 
of  those  joints  may  change  over  time.  Joints  may  be  broken,  as  happens  when  a  leg  is  removed  from 
the  table.  New  joints  may  be  formed,  as  would  happen  if  a  table  was  built  by  attaching  its  legs  to  the 
table  top.  Rigid  joint  parameters  may  become  flexible  and  flexible  joint  parameters  may  become  rigid. 
At  all  times  Abigail  maintains  a  joint  model,  a  set  of  joints  ,J  and  their  parameters,  that  she  currently 
believes  to  reflect  what  is  happening  in  the  movie.  The  process  by  which  she  updates  this  joint  model 
will  be  described  in  section  8.2.1. 


8.1.4  Layers 

Abigail’s  micro-world  is  nominally  two-dimensional.  The  movie  input  has  only  x  and  y  coordinates.  A 
two-dimensional  world,  however,  is  very  constraining.  If  one  wants  to  model  the  substantiality  constraint 
in  such  a  world,  the  movement  of  objects  world  be  severely  restricted.  For  instance,  in  the  movie  described 
in  section  6.1,  John  would  not  be  able  to  walk,  as  he  does,  from  one  side  of  the  table  to  the  other,  for 
in  doing  so,  he  would  violate  substantiality.  People,  have  no  difficulty  understanding  that  movie  even 
though  they  too,  perceive  only  a  two-dimensional  image.  That  is  because  human  world  ontology  is  three- 
dimensional  and  human  perception  understands  two-dimensional  depictions  of  a  three-dimensional  world. 
So  a  human  watching  the  movie  described  in  section  6.1  would  assume  that  John  walked  either  in  front 
of  the  table,  or  behind  it,  as  he  passed  from  one  side  to  the  other. 

I  want  to  be  able  to  model  such  a  capacity  in  Abigail  as  well.  Thus  part  of  Abigail's  pre-linguistic 
endowment  is  the  knowledge  that  each  figure  in  the  world  resides  on  some  layer.  Two  figures  may  either 
be  on  the  same  layer  or  on  different  layers.  I  will  denote  the  fact  that  two  figures  /  and  y  are  on  the 
same  layer  by  the  assertion  f  txg,  and  the  fact  that  they  are  on  different  layers  by  the  assertion  /  ^  y. 
These  layer  assertions  affect  whether  the  substantiality  constraint  holds  between  a  pair  of  figures.  Two 
figures  which  are  on  the  same  layer  must  not  overlap.  The  substantiality  constraint  does  not  apply  to 
figures  on  different  layers. 

Just  like  for  joints,  Abigail  is  not  given  layer  assertions  as  direct  input.  She  must  infer  which 
figures  are  on  the  same  layer,  and  which  are  on  different  layers,  solely  from  image  figure  data.  Again, 
much  in  the  same  way  that  joint  parameters  change  during  the  cour.se  of  a  movie,  figures  can  move 
from  layer  to  layer  as  the  movie  progresses.  Thus  which  layer  assertions  are  true  may  change  over  time. 
Abigail  maintains  a  layer  model  which  consists  of  a  set  L  of  layer  assertions  that  reflects  her  current 
understanding  of  the  movie.  The  process  by  which  she  updates  this  layer  model  will  be  discussed  in 
section  8.2.1. 

Abigail  treats  layer  assertions  as  an  equivalence  relation.  The  ex  relation  embodied  in  L  is  thus 
reflexive,  symmetric,  and  transitive.  The  layer  model  must  also  be  consistent.  It  cannot  imply  that 
two  figures  be  both  on  the  same  layer,  and  on  different  layers,  simultaneously.  Furthermore,  if  the 
layer  model  neither  implies  that  two  figures  are  on  the  same  layer  nor  that  they  are  on  different  layers. 
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Abigail  will  assumt'  that  tlu'v  are  on  different  layers  l>y  default.  Layer  assertions  are  a  weak  foriu  of 
information  about  the  third  dimension.  In  particular,  there  is  no  notion  of  one  liguri-  being  in  front 
of  or  behind  another  figure,  nor  is  there  a  notion  of  two  figures  being  on  adjacent  layers  .\o  further 
knowledge  implied  by  our  intuitive  notion  of  'layer'  is  modeled  beyond  layer  etpiii alence. 


8.2  Perceptual  Processes 

Having  presented  the  ontology  which  Abigail  projects  onto  the  world,  it  is  now  possible  to  describe  the 
process  by  which  she  perceives  support,  contact,  and  attachment  relations  between  objects  in  the  movie. 
Recall  that  Abigail  has  no  prior  knowledge  about  the  types  or  delineation  of  objects  in  the  world.  She 
interprets  any  set  of  figures  connected  by  joints  as  an  object.  To  do  .so,  she  must  know  which  figures 
are  joined.  Not  being  given  that  information  as  input,  her  first  task  is  to  form  a  model  of  the  image 
that  describes  which  figures  are  joined.  Since  the  attachment  status  of  figures  may  change  from  frame 
to  frame  as  the  movie  unfolds,  she  must  repeat  the  analysis  which  derives  the  joint  model  as  jiart  of  the 
processing  for  each  new  frame.  The  ontology  which  Abigail  projects  onto  an  image  includes  a  layer 
model  in  addition  to  a  joint  model.  Since  Abigail  is  given  only  two-dimensional  information  as  input, 
she  must  infer  information  about  the  third  dimension  in  the  form  of  layer  assertions  in  the  layer  model. 
Again,  since  figures  can  move  from  layer  to  layer  during  the  course  of  the  movie.  Abiclail  must  update 
both  the  layer  and  joint  models  on  a  per-frame  basis.  Thus  Abigail  performs  two  stages  of  proce.ssizig 
for  each  frame.  In  the  first  stage  she  updates  the  joint  and  layer  models  for  the  image.  The  derived  joint 
model  delineates  the  objects  which  appear  in  the  image.  In  the  second  stage  she  uses  the  derived  joint 
and  layer  models  to  recover  support,  contact,  and  attachment  relations  between  the  perceived  objects. 
The  architecture  used  by  Abigail  to  process  each  movie  frame  is  depicted  in  figure  8.2.  The  architecture 
takes  as  input,  the  positions,  orientations,  shapes,  and  sizes  of  the  figures  constituting  the  image,  along 
with  a  joint  and  layer  model  for  the  image.  The  architecture  updates  this  joint  and  layer  model,  groups 
the  figures  into  objects,  and  recovers  support,  contact,  and  attachment  relations  between  those  objects. 
Central  to  the  event  perception  architecture  is  an  imagination  capacity  which  encodes  naive  physical 
knowledge  such  as  the  substantiality,  continuity,  gravity,  and  ground  plane  constraints. 

8.2.1  Deriving  the  Joint  and  Layer  Models 

As  Abigail  watches  the  movie,  she  continually  maintains  both  a  joint  model  -J  and  a  layer  model  L. 
At  the  start  of  the  movie,  these  models  are  empty,  containing  no  joints  and  no  layer  a,ssertions.  After 
each  frame  of  the  movie,  .Abigail  looks  for  evidence  in  the  most  recent  frame  that  the  joint  and  layer 
models  should  be  changed.  Most  of  the  evidence  requires  that  .Abigail  hypothesize  potential  changes 
and  then  imagine  the  effect  of  these  changes  on  the  world.  Abigail  assumes  that  the  world  is  for  the 
most  part  stable.  Objects  are  typically  supported.  She  considers  an  unstable  world  with  unsupported 
objects  to  be  less  likely  than  a  stable  one.  If  the  world  is  unstable  when  imagined  without  making  the 
hypothesized  changes,  then  these  hypothesized  changes  are  adopted  as  permanent  changes  to  the  joint 
and  layer  models.  This  facet  of  .Abigail's  perceptual  mechanism  is  not  justified  by  any  e.xperimental 
evidence  from  human  perception  but  simply  appears  to  work  well  in  practice. 

Abigail's  preference  for  a  stable  world  requires  that,  to  the  extent  possible,  all  objects  be  supported. 
There  are  two  ways  to  prevent  an  object  from  falling.  One  is  for  it  to  be  joined  to  some  other  supported 
figure.  The  other  is  for  it  to  be  supported  by  another  figure.  One  figure  can  support  another  figure  only 
if  they  are  on  the  same  layer,  since  support  happens  as  a  consequence  of  the  need  to  avoid  substantiality 
violations  and  substantiality  holds  only  between  two  figures  on  the  same  layer. 

Abigail's  imagination  capacity  is  embodied  in  a  kinematic  simulator.  This  simulator  can  predict 
how  a  set  of  figures  will  behave  under  the  effect  of  gravity,  given  particular  joint  and  layer  models,  such 
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Figure  8.2:  The  event  perception  architecture  incorporated  into  Abigail.  The  architecture  takes  as 
input,  the  positions,  orientations,  shapes,  and  sizes  of  the  figures  constituting  the  image,  along  with  a 
joint  and  layer  model  for  the  image.  The  architecture  updates  this  joint  and  layer  model,  groups  the 
figures  into  objects,  and  recovers  support,  contact,  and  attachment  relations  between  those  objects. 
Central  to  the  event  perception  architecture  is  an  imagination  capacity  which  encodes  naive  physical 
knowledge  such  as  the  substantiality,  continuity,  gravity,  and  ground  plane  constraints. 
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that  naive  physical  constraints  sucli  as  substantiality  are  iiplieUl.  I'liis  iniay,inat  ion  capacity,  ilenoted 
as  l(T.,].L]  will  be  described  in  detail  in  chapter  tl.  The  processes  described  here  treat  this  ca|>aciiy 
as  modular.  .4ny  simulation  mechanism  that  accurately  moilels  j>,ravit>  and  sulistantiality  will  do.  I  he 
event  [lerception  processes  simjily  call  with  diH’erent  values  ol'  E .  J .  and  L.  askinj;  different 

(ptestions  of  the  predicted  future,  in  the  process  of  updating  the  joitit  and  layer  moilels  and  recovering 
support  relations.'' 

Abigail  can  change  the  joint  and  layers  models  in  six  different  ways  to  keep  those  models  synchro¬ 
nized  with  the  world.  She  can 

•  add  a  layer  cLssertion  to  L. 

•  remove  a  layer  cissertioii  from  L. 

•  add  a  joint  to  J, 

•  remove  a  joint  from  J , 

•  promote  a  parameter  of  some  joint  j  G  J  from  flexible  to  rigid. 

•  demote  a  parameter  of  some  joint  j  €  J  from  rigid  to  flexible. 

or  perform  any  simultaneous  combination  of  tlie  above  changes.  Each  type  of  change  is  motivated  by 
particular  evidence  in  the  most  recent  movie  frame,  potentially  mediated  by  the  imagination  process. 

Abigail  makes  three  types  of  changes  to  the  layer  model  on  the  basis  of  evidence  gained  from 
watching  each  movie  frame.  The  process  can  be  slated  informally  as  follows.  .She  will  add  an  a.s.serlion 
that  two  figures  are  on  different  layers  whenever  they  overlap,  since  if  they  were  not  on  different  layers, 
substantiality  would  be  violated.  She  will  add  an  assertion  that  two  figures  are  on  the  same  layer 
whenever  one  of  the  figures  must  support  the  other  in  order  to  [)reserve  the  stability  of  the  image. 
Finally,  whenever  newer  layer  assertions  contradict  older  layer  assertions,  the  okler  ones  are  removed 
from  the  layer  model  giving  preference  to  newer  evidence.  For  example,  when  presented  with  the  image 
from  figure  6.1,  Abigail  will  infer  that  the  ball  and  the  table  top  are  on  the  same  layer  since  the  ball 
would  fall  if  it  was  not  supported  by  the  table  top. 

The  process  of  updating  the  layer  model  can  be  stated  more  precisely  as  follows.  A  layer  model 
consists  of  an  ordered  set  L  of  layer  assertions.  Initially,  at  the  start  of  t  he  movie,  this  .set  is  empty.  The 
closure  of  a  layer  model  is  the  layer  model  augmented  with  all  of  the  layer  a-ssertions  entailed  by  the 
equality  axioms.  A  layer  model  is  consistent  if  its  closure  does  not  simultaneously  imply  that  two  figun's 
are  on  the  sante,  as  well  as  different,  layers.  Abigail  never  replaces  the  layer  model  with  its  closure. 
She  always  maintains  the  distinction  between  layer  a,ssertions  that  have  been  added  to  the  mod'd  as  a 
result  of  direct  evidence,  in  contrast  to  those  which  have  been  derived  by  closure.  A  ma.ximaf  consistent 
subset  of  a  layer  model  L  is  a  consistent  subset  L'  of  L  such  that  any  other  subset  L"  of  L  that  is  a 
superset  of  V  is  inconsistent.  The  lexicographic  maximal  consistent  subset  of  a  layer  model  L  is  the 
particular  maximal  consistent  subset  of  L  returned  by  the  following  procedure. 

1  procedure  Maximal  Consistent  Subset(L) 

2  L'-{}- 

.3  for  a  £  L 

4  do  if  T'  U  {o}  is  consistent 

5  then  L' — L'  U  {a}  fi  od; 

6  return  L'  end 


®As  discussed  in  chapter  9.  the  imagination  capacity  I{J-.J,L.P)  takes  a  predicate  P  as  its  fourth  parameter.  In 
informal  presentations,  it  is  simpler  to  omit  this  parameter  and  use  the  English  gloss  “P  occurs  during  /(P.  J./.)  in  place 
of /(JP.  J,L.P). 
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The  above  procedure  may  not  find  the  largest  possilile  maximal  consistent  Mibset.  1  hat  iirol)lem  lias 
been  shown  to  be  NP-hard  by  Wolfram  (  UIHb).  I’siiig  the  above  heuristic  has  [irovi  n  adequate  in  practice, 

(liven  the  above  procedure  we  can  now  define  the  process  used  to  update  the  layer  model  We 
define  l.^  to  be  the  set  uf  all  dilfereiit  layer  assertions  /  tj^y.  where  /  and  y  overlap  in  the  most  recent 
movie  frame.  These  are  layer  assertions  which  must  be  added  to  the  layer  model  in  order  not  ti.)  violate 
substantiality.  We  define  to  be  the  set  of  all  same-layer  assertions  /  ixi  </.  where  /  and  y  touch  in 
the  most  recent  movie  frame.  These  are  hyi>ot Resized  layer  a.ssertions  which  could  potentially  account 
for  supiiort  relationships  needed  to  preserve  stability.  contains  as.sertions  only  lietween  figures  which 
touch  since  only  such  assertions  could  potentially  contribute  to  sujqiort  relationships.  1  he  layer  model 
updating  procedure  makes  permanent  only  those  hypothesized  same-layer  assertions  that  actually  do 
prevent  figures  from  falling  under  imagination.  The  layer  model  updating  procedure  is  as  follows.' 

1  procedure  I'pdate  Layer  Model 

2  for  /  c<  y  e 

3  do  if  neither  /  nor  g  move  during 

4  1(  'JF. ./,  Maximal  Consistent  Si:bset(L^  U  (L«  -  {  /  xi  </})  U  L)) 

T)  then  L,x — La<  —  {f  xy}  fi  od; 

6  t— Maximal  Consistent  Subset(L,^u  L^u  L)  end 


The  process  of  updating  the  joint  model  is  conceptually  very  similar  to  updating  the  layer  model.  The 
algorithm  is  illustrated  in  figure  8.3.  First,  remove  all  joints  j  from  J  where  f{j)  does  not  intersect  y{j) 
in  the  most  recent  frame  (lines  2  and  3).  Second,  demote  any  rigid  parameter  of  any  joint  j  G  J  when  the 
constraint  implied  by  that  parameter  is  violated  (lines  4  through  9).  Third,  remove  all  joints  j  from  •/ 
where  both  ^f(j)  and  are  flexible  (lines  10  and  JJ).  This  is  to  enforce  the  constraint  from  page  127 
that  every  joint  have  at  least  one  rigid  displacement  parameter.  Fourth,  find  a  minimal  set  of  parameter 
promotions  and  new  joints  that  preserve  the  stability  of  the  image  (lines  12  through  33).  To  do  this 
we  form  the  set  J'  of  all  joints  j'  where  f{j')  intersects  g(j')  in  the  most  recent  movi(>  frame  (lines  12 
through  20).  Those  joints  in  J'  which  appear  in  J  have  their  parameters  initialized  to  the  same  values 
as  their  counterparts  in  -7,  while  any  new  joints  have  their  parameters  initialized  to  he  flexible.  We  then 
promote  all  of  the  flexible  parameters  in  •/'  to  have  the  rigid  values  that  they  have  in  the  most  recent 
movie  frame.  One  by  one  we  temporarily  demote  each  of  the  parameters  just  promoted  and  imagine 
the  world  (lines  21  through  33).  If  when  demoting  a  parameter  of  a  joint  f.  the  constraint  specified 
by  the  original  rigid  parameter  is  not  violated  during  the  imagined  outcome  of  that  demotion,  then 
that  demotion  is  preserved.  Otherwise,  the  parameter  is  promoted  back  to  the  rigid  value  it  has  in  the 
most  recent  movie  frame.  After  trying  to  demote  each  of  the  newly  promoted  joint  parameters,  remove 
all  joints  j'  from  J'  where  both  6j(f)  and  6g{f)  are  flexible  (lines  34  and  33)  and  replace  .)  with 
(line  36).® 

Recall  that  an  object  can  be  supported  in  two  ways,  either  by  being  joined  to  another  object  or  by- 
resting  on  top  of  another  object  on  the  same  layer.  .Abigail  gives  preference  to  the  latter  explanation. 
Whenever  the  stability  of  an  image  can  be  explained  by  hypothesizing  either  a  joint  between  two  figures 
or  a  same-layer  as.sertion  between  those  two  figures,  the  same-layer  a.ssertion  will  be  preferred.  Thus  for 
the  image  in  figure  6.1,  .Abigail  infers  that  the  ball  is  resting  on  top  of  the  table,  by  virtue  of  the  fact 
that  they  are  on  the  same  layer,  and  not  attached  to  the  side  of  the  table.  If  .Abigail  did  not  maintain 

'  The  notation  /  used  here  and  in  figure  8.3  is  described  on  page  160. 

®Only  a  simplified  version  of  this  algorithm  is  currently  implemented.  First,  the  implemented  version  doe.s  not  consider 
promoting  existing  flexible  joints  to  explain  the  stabiUty  of  an  image.  Only  newly  created  rigid  joints  t  an  offer  su(*h 
support.  Second,  newly  added  joints  are  alw'ays  rigid.  They  are  demoted  to  be  flexible  only  when  they  move.  Thus  rather 
than  finding  a  minimal  set  of  promotions  which  make  the  image  stable,  the  current  implementation  finds  a  minimal  set  of 
new  rigid  joints  to  stabilize  the  image. 
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1 

procedure  IIpdatf.  Joint  Model 

2 

for 

j  ^  j 

3 

do 

if  f(j)  does  not  intersect  y{j)  then  .7 — 7  —  { j }  fi  od: 

4 

for 

j  €  .7 

5 

do 

if  0{j)  nil  A  0{j)  &(yU))  -  Hfij))  than  fi  od; 

6 

for 

J  €  7 

7 

do 

if  7^  uilA/t/O)  ^  fiJ))  then  nil  fi  od: 

8 

for 

j  €  J 

9 

do 

if  6g{j)  ^  nil  A/>j,(j)  ^  then  <^^(7)— nil  fi  od: 

10 

for 

7  G  7 

11 

do 

if  6f{j)  =  nil  A  =  nil  then  7 — 7  —  {j}  fi  od; 

12 

.]•- 

-{}; 

13 

for 

14 

do 

for  g  €  T 

15 

do  if  /  intersects  g  at  p 

16 

then  j'  =  f  —  g. 

17 

0U')-0(g)-O(fy. 

18 

19 

^(/)— ^(p.3); 

20 

.7'-.7'U{/)  fi  od  od; 

21 

for 

y  €  7' 

22 

do 

7— nil; 

23 

for  j"  €  7 

24 

do  if  fU")  -  f(f)  A  g(j")  -  gU')  then  j—j"  fi  od. 

25 

0(j')— nil: 

26 

if  (j  ^  nil  A  ^0)  ^  nil)  V  <?(</(/))  -  ^(/(/))  ^  9  during  /(/•../'.  L) 

27 

then  0{y) — 0  fi; 

28 

nil;  p^pij'y 

29 

if  (j  ^  nil  A  df(j)  ^  nil)  V  f'/ip.fij'))  during  RT.-V .  I.) 

30 

then  — iij  fi; 

31 

^a—^sU'y  <^5(7')— nil; 

32 

if  (j  nil  A  ^j(j)  ^  nil)  V  (>g{p,g(f))  ^  fig  during 

33 

then  fg(.j') — fg  fi  od; 

34 

for 

y  e  r 

35 

do 

if  f;(y)  =  nil  A  fqU')  =  nil  then  .7'  — 7'  —  {/}  fi  od; 

36 

j- 

end 

Figure  8.3:  The  algorithm  for  updating  the  joint  model.  Abigail  performs  this  procedure  as  part 
of  her  processing  of  each  frame  in  the  movie  she  watches. 
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this  preference  slie  would  never  form  same-layer  ju<l{;iiients.  since  any  tiiiu'  a  same-layer  assertion  can 
be  used  to  provide  support,  a  joint  can  be  used  as  well.  I  he  fact  that  the  coniiTse  is  not  true  allows  her 
to  hypothesize  joints  when  an  object  would  slide  off  another  object  even  if  they  were  on  the  ^ame  layer 

The  joint  and  layers  models  must  be  updated  simultaneously  by  ;  tandem  process  rather  than 
independently.  If  the  joint  model  was  updated  before  the  lay<-r  model  there  would  be  no  wa>  to  enforce 
the  aforementioned  preference  for  same-layer  suiiport  over  joint  siipjiort.  vtn  the  other  hand,  the  la\er 
model  cannot  be  created  before  the  joint  model.  When  processing  the  first  image,  starting  out  with 
an  empty  joint  model,  .\Blt;.vii.  could  not  itifer  any  layer  informatioti.  since  a  la\er  model  alone  i,- 
insufficient  to  explain  support.  VN’ithout  any  joints,  no  set  of  layer  as.sertions  can  improve  the  stabilit  x  of 
an  image.  Thus  the  processes  of  updating  the  joint  and  layers  models  are  inttTlea\ed.  finding  the  least 
cost  combination  of  same-layer  assertions  and  joint  promotions  which  improve  the  stability  of  the  image. 
When  computing  the  cost  of  such  a  combination,  same-layer  a.ssertions  have  lower  cost  than  jiromotiotis 
of  existing  joints,  which  in  turn  have  lower  cost  than  creation  of  new  joints. 

The  method  used  by  Abigail  to  construct  and  update  the  joint  and  layer  models  is  best  illustrated 
by  way  of  an  example.  The  following  example  depicts  the  actual  results  getierateil  by  ABIGAIL  when 
processing  the  first  twelve  frames  of  the  movie  described  in  section  (j.l.  Figure  fs.4  shows  these  first  twelve 
frames  in  greater  detail.  Since  frame  (J  is  the  first  frame  of  the  movie.  .Abigail  starts  out  (irocessing 
this  frame  with  empty  joint  and  layer  models.  With  empty  models,  the  world  is  completelj  unstable 
and  collapses  into  a  pile  of  rubble  when  the  short-term  future  is  imagined.  This  is  deiiicied  by  the 
imagination  sequence  given  in  figure  8.0.  Accordingly,  .Abigail  hypothesizes  the  set  of  joints  depicted 
in  figure  8.6  and  layer  assertions  depicted  in  figure  8.7.  A  joint  is  hypothesized  between  evert  (lair  of 
intersecting  figures.  A  same-layer  assertion  is  hypothesized  between  evi-ry  [)air  of  figures  that  touch.  A 
different-layer  assertion  is  hypothesized  between  every  pair  of  overlajiping  figures.  ,\ot  all  of  these  joints 
and  layer  assertions  are  necessary  to  explain  the  stability  of  the  image.  By  the  process  described  above. 
Abigail  chooses  to  retain  only  the  starred  joints  and  layer  assertions.  With  this  new  joint  and  layer 
model,  the  image  is  stable.'^ 

Several  things  about  the  derived  joint  and  layers  models  are  worthy  of  discussion.  First,  note  that 
the  final  layer  model  includes  the  following  assertions*® 

(circle  ball)  1x3 (top  table) 

(bottom  box)  ix (top  table) 

indicating  that  Abigail  has  determined  that  the  ball  and  the  bottom  of  the  box  are  resting  on  the 
table  rather  than  being  joined  to  the  table  top.  Second,  the  hem  of  Mary  's  dress  need  only  be  joined 
to  one  side  of  her  dress,  since  one  rigid  joint  is  sufficient  to  support  the  line  segment  constituting  the 
hem.  Third,  the  image  contains  a  number  of  locations  where  the  endpoints  of  multiple  line  segments  an- 
coincident  on  the  same  point.  Such  a  situation  arise.s.  for  example,  where  .John's  legs  tiieet  his  torso.  In 
this  situation,  three  joints  are  possible. 

(torso  john)  —  (right-thigh  john) 

(torso  john)  —  (leit-thigh  john) 

(right-thigh  john)  —  (left-thigh  john) 

All  three  of  these  joints  are  not  necessary  to  achieve  a  stable  image  however.  Any  two  of  these  joints  are 
sufficient,  since  relative  rigidity  is  transitive.  .Abigail  arbitrarily  chooses  the  last  two  joints  as  tlie  ones 

® Except  for  the  feict  that  .lohn's  cind  Mary's  eyes  fall  out.  since  they  appear  unsupported.  Tliis  highlights  a  dehciency 
in  the  ontology  incorporated  into  Abigail's  perceptual  mechanisms.  1  will  not  address  this  anomaly,  and  methods  for 
dealing  with  it.  in  this  thesis, 

*®In  this  and  all  further  discussion,  expressions  such  as  (circle  ball)  denote  particular  figures.  Tlie.se  figures  are  given 
names  to  aid  in  the  interpretation  of  the  results  produced  by  Abigail.  Abigail  does  not  have  access  to  these  names 
during  processing,  so  that  fact  that  the  names  of  several  figures,  i.e.  (circle  ball),  (line-segmentl  ball),  etc.  share  the 
component  ball  in  common,  in  no  way  assists  Abigail  in  her  perceptual  processing. 
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Frame  0,  Imagination  Step  12 
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Figure  8.5:  A  subsequence  of  images  produced  by  Abigail  while  imagining  the  short-term  future  of 
frame  0  from  the  movie  described  in  section  6.1  with  empty  joint  and  layer  models. 
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(he»  nary)  —  (dress2  Bcury) 

*  (he*  mary)  —  (dressl  «ary) 

(dress2  mary)  —  (dressl  nary) 

(dres82  mary)  —  (torso  mary) 

(dress2  mary)  —  (right-upper-arm  mary) 

*  (dre882  mary)  —  (left-upper-eurm  mary) 

(dresal  mary)  —  (torso  mary) 

(dressl  mary)  —  (right-upper-<u:ii  mary) 

*  (dressl  mary)  —  (left-upper-arm  mary) 

*  (mouth  mary)  —  (head  mary) 

*  (head  mary)  —  (torso  mzury) 

(torso  mary)  —  (right-thigh  mary) 

*  (torso  mary)  —  (left-thigh  meiry) 

(torso  mary)  —  (right-upper-2urm  mary) 

*  (torso  mary)  —  (left-upper-jur*  mary) 

*  (right-thigh  mary)  »-  (left-thigh  mary) 

*  (right-thigh  mary)  —  (right-calf  mia-y) 

*  (left-thigh  mary)  —  (left-calf  mary) 

*  (right-upper-arm  mary)  —  (left-upper-aurm  mary) 

*  (right-upper-arm  mary)  —  (right-fore-arm  maury) 

*  (left-upper-arm  mary)  —  (left-fore-arm  mary) 

*  (mouth  John)  —  (head  John) 

*  (head  John)  —  (torso  John) 

(torso  John)  —  (right-thigh  John) 

*  (torso  john)  —  (left-thigh  joho) 

(torso  john)  —  (right-upper-arm  john) 

*  (torso  john)  —  (left-upper-arm  john) 

*  (right-thigh  john)  —  (left-thigh  john) 

*  (right-thigh  john)  —  (right-calf  john) 

*  (left-thigh  john)  *—  (left-calf  john) 

*  (right-upper-arm  john)  —  (left-upper-arm  john) 

*  (right-upper-arm  john)  —  (right-fore-aurm  john) 

*  (left-upper-aum  john)  —  (left-fore-arm  john) 
(circle  ball)  —  (line-segment3  ball) 

(circle  ball)  —  (line-segmentS  ball) 

*  (circle  ball)  —  (line-8egment2  ball) 

(circle  ball)  —  (line-segment2  ball) 

*  (circle  ball)  —  (line-segment 1  ball) 

(circle  ball)  —  (line-segment 1  ball) 

(circle  ball)  —  (left-leg  table) 

(circle  ball)  —  (top  table) 

(circle  ball)  —  (top  table) 

(bottom  box)  —  (right-wall  box) 

(bottom  box)  —  (left-wall  box) 

(bottom  box)  —  (right-leg  table) 

*  (right-wall  box)  —  (top  table) 

*  (left-wall  box)  — *  (top  table) 

*  (seat  chair2)  —  (back  chair2) 

*  (seat  chair2)  —  (front  chair2) 

*  (seat  chair 1)  —  (back  chair 1) 

*  (seat  chairl)  —  (front  chair 1) 

*  (right-leg  table)  —  (top  table) 

*  (left-leg  table)  —  (top  table) 


Figure  8.6;  Abigail  hypothesizes  these  joints  when  processing  frame  0  of  the  movie  depicted  in 
figure  8.4.  Since  not  all  of  these  joints  are  necessary  to  explain  the  stability  of  the  image.  Abigail 
retains  only  the  starred  joints. 
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(hem  meury)  ex)  (dr«8s2  mary) 

(hem  mary)  ixi  (dreasl  »ary) 

(dress?  Kary)  cxi  (dressl  Mary) 

(dress?  nary)  tx  (torso  maury) 

(dress?  aary)  ex  (right-upper-ar«  aary) 

(dress?  aary)  tx  (left-upper-ara  aary) 

(dressl  aary)  (x  (torso  aary) 

(dressl  aaury)  cx  (right-upper-ara  aary) 

(dressl  aary)  cx  (left-upper-ara  aary) 

(aouth  aao-y)  cx  (head  aary) 

(head  aary)  cx  (torso  aary) 

(torso  aary)  cx  (right-thigh  aary) 

(torso  aary)  cx  (left-thigh  aary) 

(torso  aary)  cx  (right-upper-ara  aary) 

(torso  aary)  tx  (left-upper-ara  mary) 
(right-thigh  aary)  tx (left-thigh  aary) 
(right-upper-ara  aary) cx (left-upper-ara  aary) 
(right-upper-ara  aary)  cx  (right-fore-ara  aary) 
(aouth  john)  tx  (head  John) 

(head  john)  cx  (torso  john) 

(torso  john)  cx  (right-thigh  john) 

(torso  john)  cx  (left-thigh  john) 

(torso  john)  tx  (right-upper-ara  john) 

(torso  john)  tx  (left-upper-ara  john) 
(right-thigh  john)  tx  (left-thigh  john) 
(right-thigh  john)  tx  (right-calf  john) 
(left-thigh  john)  tx (left-calf  john) 
(right-upper-ara  john)  tx (left-upper-ara  john) 
(right-upper-ara  john)  tx  (right-fore-ara  john) 

*  (circle  ball)  cx  (line-segaent3  ball) 

(circle  ball)  ix  (left-leg  table) 

*  (circle  ball)  cx  (top  table) 

(bottom  box)  tx  (right-w2J.l  box) 

(bottom  box)  tx  (left-wall  box) 

(bottom  box)  tx  (right-leg  table) 

*  (bottom  box)  cx  (top  table) 

(right-wall  box)  cx  (top  table) 

(left-wall  box)  tx (top  table) 

(seat  chair?)  cx  (back  chair?) 

(seat  chair?)  cx  (front  chair?) 

(seat  chairl)  cx  (back  chairl) 

(seat  chairl)  cx (front  chairl) 

(right-leg  table)  ix (top  table) 

(left-leg  table) tx  (top  table) 

*  (hem  mary)  9^1  (right-calf  mary) 

♦  (hem  mary)  9^  (left-calf  mary) 

♦  (dressl  m^u:y)  tjh  (left-fore-arm  mary) 

♦  (torso  mary)  9<i  (left-fore-arm  mary) 

+  (torso  john)  9<]  (left-fore-arm  john) 

*  (line-segments  ball)  (fit  (line-segment?  ball) 

♦  (line-segments  ball)  9^1  (line-segment  1  ball) 

♦  (line-segment?  ball)  9^  (line-segment  1  ball) 


Figure  8.7:  Abigail  hypothesizes  these  layer  assertions  when  processing  frame  0  of  the  movie  de¬ 
picted  in  figure  8.4.  Since  not  all  of  these  layer  assertions  are  necessary  to  explain  the  stability  of 
the  image,  A  bigaii  retains  only  the  starred  layer  assertions. 
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to  make  part  of  her  joint  model. 

TIte  joint  and  layer  models  eon.structed  by  Abigail  contain  a  nnmher  of  ajiomalies  that  point  out 
deficiencies  in  the  perceptual  theory.  First,  note  that  (line-segmentS  ball)  is  not  conin'cted  to  the 
remaining  components  of  the  hall.  The  intention  was  that  the  l)all  would  he  comitosed  of  four  fig¬ 
ures,  a  circle  and  three  line  segments.  .Abigail  perceives  (line-segmentS  ball)  to  he  a  se|)arate 
object  inside  the  ball.  This  is  a  possible  interpretation  given  her  ontology  since,  being  insid('  the  ball, 
(line- segment 3  ball)  is  supported  by  resting  on  the  interior  perimeter  of  the  circle,  and  thus  there  is 
no  need  to  postulate  a  joint  to  achieve  stability.  In  fact,  given  .Abk;  ail's  preference  for  su])port  relations 
over  joints,  she  must  come  to  this  analysis.  Why  then  are  the  remaining  two  line  segments  not  sup|)orted 
in  an  equivalent  fashion  without  joints'.^  Tlie  answer  is  simple.  For  a  line  segment  to  be  so  supported 
it  must  be  on  the  same  layer  as  the  circle.  Since  layer  equivalence  is  a  transitive  relation,  all  three  line 
segments  would  have  to  be  on  the  same  layer.  They  cannot  he  however,  as  their  intersection  would  then 
constitute  a  substantiality  violation.  Thus  only  one  line  segment  can  be  explained  by  support.  .\bk:a1L 
arbitrarily  chooses  (line- segment 3  ball)  as  that  line  segment. 

The  joint  and  layer  models  exhibit  a  second,  more  serious,  anomaly.  While  Abigail  correctly  deter¬ 
mines  that  the  bottom  of  the  box  rests  on  the  table  top.  she  incorrectly  decides  that  the  vertical  walls 
of  the  box  are  joined  to  the  table  top  rather  than  the  box  bottom.  This  is  a  plausible  but  unintended 
interpretation.  Both  interpretations  require  the  same  number  of  joints,  thus  neither  is  preferable  to  the 
other.  One  way  of  driving  Abigail  to  the  intended  interpretation  would  be  to  add  an  additional  level 
to  the  preference  relation  between  joint  and  layer  models  to  prefer  one  model  over  another  if  its  joints 
connected  smaller  figures  rather  than  larger  ones,  given  that  tw’o  models  otherwi.se  had  the  same  number 
of  joints.  1  have  not  tried  this  heuristic  to  see  if  it  would  work. 

At  this  point  .Abigail  begins  processing  frame  1.  Between  frame  0  and  frame  1.  John  lifted  his  right 
foot.  In  doing  so  he  rotated  his  right  knee  and  thigh  joints.  Thus  the  first  thing  .Abigail  does  is  demote 
the  rotation  parameters  for  the  joints 

(right-thigh  john)  —  (left-thigh  john) 

(right-thigh  john)  —  (right-calf  john) 

from  being  rigid  to  being  flexible.  The  resulting  image  is  not  stable  however.  Since  John  appears  to 
stand  on  one  foot,  he  falls  over  when  the  future  is  imagined."  In  the  process  of  falling  his  right  thigh 
can  rotate  relative  to  his  torso  since  that  joint  is  now  flexible.  Abigail  hypothesizes  the  existence  of  a 
new  rigid  joint,  (torso  john)  —  (right-thigh  john).  While  this  joint  does  not  prevent  John  from 
falling,  it  does  prevent  the  rotation  of  his  right  thigh  relative  to  his  torso  during  that  fall.  .Abigail 
adopts  that  joint  as  part  of  the  updated  model  since  she  adopts  any  joint  which  prevents  the  relative 
rotation  of  the  two  figures  it  would  connect. 

At  this  point  .Abigail  begins  processing  frame  2.  Between  frame  1  and  frame  2.  John  started  moving 
forward.  In  doing  so  he  rotated  his  left  knee  and  thigh  joints,  causing  .Abigail  to  demote  the  rotation 
parameters  for  the  joints 


(torso  john)  —  (left-thigh  john) 

(left-thigh  john)  —  (left-calf  john) 

from  being  rigid  to  being  flexible.  Between  frame  2  and  frame  3.  John  begins  moving  his  right  foot 
forward  as  well,  pivoting  his  right  thigh  relative  to  his  torso.  This  causes  .Abigail  to  demote  the 
rotation  parameter  for  the  joint  (torso  john)  —  (right-thigh  john).  just  created  while  proce.ssing 
frame  1,  from  being  rigid  to  being  flexible.  The  model  now  con.strurted  remains  unchanged  until  frame  7. 

**I  will  not  show  the  resulting  imagined  image  since  John  feills  backward  out  of  the  field  of  view  due  to  the  fart  that 
his  center-of-mass  is  behind  his  left  foot.  Later  in  the  text.  I  will  illustrate  the  imagined  future  of  frame  11.  where  .iohn's 
center-of-mass  has  shifted  so  that  he  falls  forward  in  a  visible  fcLshion. 
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In  frame  7.  John's  knees  appear  close  together  as  his  right  leg  passes  his  left  leg  This  causes 
Abigail  to  postulate  a  spurious  joint,  (right-calf  John)  —  (left-calf  John),  hetween  John  s  two 
knees.  Again,  while  this  joint  does  not  prevent  John  from  falling,  it  does  reduce  the  movement  of  his  legs 
during  that  fall.  This  reduction  in  leg  movement  prompts  Abigail  to  adopt  tlie  joint  as  part  of  her  joint 
model.  This  spurious  joint  is  then  dropped  from  the  joint  model  after  frame  t<,  since  (right-calf  John) 
and  (left-calf  John)  no  longer  intersect.  Furthermore,  as  a  result  of  observing  the  right  leg  pass  the 
left  leg  during  its  forward  motion,  Abigail  adds  the  following  two  assertions  to  the  layer  model 

(left-thigh  John)  9^3  (right-calf  John) 

(right-calf  John)  tjh  (left-calf  john) 

knowing  that  otherwise,  a  substantiality  violation  would  have  occurred.  At  this  point,  the  model  remains 
unchanged  through  frame  11. 

Figure  8.8  depicts  the  sequence  of  images  produced  by  Abigail  while  imagining  the  short-term  future 
of  frame  11.  For  reasons  discussed  previously.  John's  and  Mary's  eyes  fall  out  in  ste]>s  1  and  2.  In  step  3, 
John  pivots  about  his  left  leg  until  his  right  foot  reaches  the  floor.  In  step  4.  he  pivots  al^out  his  right 
foot  until  his  right  knee  reaches  the  floor.  In  step  5.  he  then  pivots  about  his  right  knee  until  both  his 
hand  and  head  reach  the  floor.  This  is  possible  since  his  right  knee  has  a  flexible  rotation  parameter. 
Note  that  his  head  can  appear  to  pass  through  the  chair  since  Abigail  assumes  that  objects  are  on 
different  layers  unless  she  has  explicit  reason  to  believe  that  they  are  on  the  same  layer.  Finally,  in 
step  (5,  his  left  calf  pivots  about  his  left  knee  until  his  left  foot  reaches  the  floor.  Again,  this  is  possible 
since  his  left  knee  has  a  flexible  rotation  parameter. 

One  can  imagine  other  .sources  of  evidence  which  can  be  used  to  update  the  joint  and  layer  models. 
Collisions  can  be  u.sed  to  determine  that  two  objects  are  on  the  same  layer,  since  two  objects  must  l)e  on 
the  same  layer  in  order  to  collide.  A  sequence  of  frames  where  one  object  moves  toward  another  object  hut 
upon  contact  (or  approximate  contact  given  the  finite  frame  rate)  begins  moving  away  from  that  object, 
can  be  interpreted  as  a  collision  event,  giving  evidence  that  the  contacting  figures  of  each  object  are 
on  the  same  layer.  Such  inference  could  provide  information  not  derivable  by  the  procedure  previously 
described.  It  is  not  currently  implemented,  as  determining  collisions  requires  tracking  momentum  of 
objects  across  frames.  Abigail  currently  processes  each  frame  individually. 

The  continuity  constraint  offers  another  source  of  evidence  which  can  be  used  to  infer  that  objects 
are  on  different  layers.  Seeing  an  object  totally  enclosed  by  another  object  in  one  frame,  and  then 
outside  that  object  in  the  following  frame,  gives  evidence  that  the  two  objects  are  on  different  layers, 
even  without  a  directly  observed  substantiality  violation,  since  there  would  be  no  way  for  that  transition 
to  occur,  given  continuous  movement  and  the  substantiality  constraint,  unless  the  two  objects  were  on 
different  layers.  In  contrast  to  collisions,  this  would  offer  little  additional  inferential  power  since  given 
a  sufficiently  high  frame  rate  relative  to  object  velocities,  the  observer  would  see  an  intermediate  frame 
with  a  direct  substantiality  violation. 

8.2.2  Deriving  Support,  Contact,  and  Attachment  Relations 

Abigail  maintains  a  joint  and  layer  model  to  reflect  her  understanding  of  the  movie.  These  models 
are  continually  updated,  on  a  frame-by-frame  basis,  by  the  processes  described  in  the  previous  section. 
The  models  form  the  basis  of  mechanisms  used  to  derive  changing  support,  contact,  and  attachment 
relationships  between  objects  in  the  movie.  It  is  necessary,  however,  to  first  delineate  those  collections 
of  figures  which  constitute  objects.  To  this  end.  .Abigail  forms  the  connected  components  in  a  graph 
whose  vertices  are  figures  and  edges  are  joints.  Each  connected  component  is  taken  as  an  object.  Not 
all  connected  sets  of  figures  constitute  objects.  Only  those  which  form  complete  connected  components 
are  taken  as  objects.  Once  a  set  of  figures  is  determined  to  be  an  object,  however,  that  set  retains 
its  status  as  an  independent  object,  even  though  it  may  later  be  joined  to  another  object.  When  that 
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Figure  8.8;  The  sequence  of  images  produced  by  Abigail  while  imagining  the  short-term  future  of 
frame  11  from  the  movie  described  in  section  6.1. 
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happens,  a  part-whole  hierarchy  is  created  which  represents  both  the  individual  i)arts,  as  well  as  the 
combined  whole,  as  objects.  This  is  needed  to  model  grasping  as  the  formation  of  a  joint  between  one's 
hand  and  the  grasped  object.  The  independent  idetttily  of  both  the  person  grasping  an  object,  as  well 
as  the  object  being  grasped,  must  be  maintained,  despite  the  creation  of  a  spurious  combined  object. 
Likewise,  when  a  joint  is  removed  from  the  joint  model,  an  object  is  broken  into  parts  which  are  taken 
as  objects.  The  identity  of  the  origitial  object  is  retained  however.  The  new  parts  are  thought  of  both 
as  objects  in  their  own  right,  as  well  as  parts  of  an  ol)ject  no  longer  in  e.xistence.  Abic;ail  cotisiders 
an  object  to  exist  if  the  set  of  figures  constituting  that  object  are  currently  connected.  In  tins  way, 
Abig.ail  can  fortn  a  primitive  model  of  the  words  iiiakf  and  bitak  as  the  transition  of  an  object  frotn 
non-existence  to  existence  and  vice  versa.  Furthermore,  since  Abigail  retains  the  identity  of  objects  no 
longer  in  existence,  it  is  possible  to  model  the  word  fix  as  the  transition  of  an  object  from  existence  to 
non-existence  and  then  back  again  to  existence. 

Given  the  segmentation  of  an  image  into  objects,  the  joint  anil  layer  models  form  the  basis  for 
detecting  contact  and  attachment  relatiotis  between  those  objects.  Two  objects  are  attached  if  the 
current  joint  model  contains  a  joitit  between  some  figure  of  one  object  and  some  figure  of  the  other 
object.  Two  objects  are  in  contact  if  some  figure  of  one  object  both  touches  (in  the  sense  described  in 
figure  8.1),  and  is  on  the  same  layer  as.  some  figure  of  the  other  object.  Detecting  support  relations, 
however,  requires  further  use  of  the  itnagination  capacity.  The  lexical  semantic  representation  presented 
in  chapter  7  uses  two  different  support  primitives,  one  to  determine  whether  an  object  is  suiiiiorted,  the 
other  to  determine  if  one  object  supports  another.  An  object  is  considered  supported  if  it  does  not  move 
when  the  short-term  future  of  the  world  is  imagined.  A  single  call  to  1(T.  J.L)  will  suffice  to  determine 
those  objects  which  are  unsupported. To  determine  whether  an  object  A  supports  another  object  B. 
.\bigail  imagines  whether  B  would  fall  if  A  were  removed.  This  is  done  by  calling  /(/■— figures(.4). /, ) 
and  seeing  if  B  moves.  Ati  object  .4  supports  another  object  B  only  if  B  is  indeed  supported.  The  fact 
that  B  falls  when  A  is  removed  is  insufficient  to  infer  that  A  supports  B  since  B  may  have  fallen  even 
with  .4  still  in  the  image.  Here  again,  a  single  call  to  I{T  —  figures(.4),  .7.  L)  can  be  used  to  determine 
all  of  the  different  objects  B  which  are  supported  by  ,4.  Thus  for  n  objects,  ii  -1- 1  calls  to  the  imagination 
capacity  /  must  be  performed  per  frame  to  determine  all  support  relationships.’  ’ 

The  recovery  of  support,  contact,  and  attachment  relations  from  image  sequences  is  best  illustrated 
by  way  of  several  examples.  Since  the  full  movie  from  section  6.1  is  fairly  complex,  1  will  first  illustrate  the 
results  produced  by  Abigail  while  processing  a  much  shorter  and  simpler  movie.  This  movie  depicts 
a  single  object,  John,  taking  two  steps  forward,  turning  around,  and  taking  two  steps  in  the  other 
direction.  It  contains  68  frames,  each  containing  10  line  segments  and  2  circles.  Figures  8.9  depicts  the 
pivotal  frames  of  this  short  movie. 

Abigail  is  able  to  fully  process  this  movie  in  several  minutes  of  elapsed  time  on  a  Symbolics  XL  1200^^' 
computer,  taking  several  seconds  per  frame.  This  is  within  two  orders  of  magnitude  of  the  processing 
speed  necessary  to  analyze  such  a  movie  in  real  time.  The  result  of  Abigail's  analysis  is  depicted  by 
the  event  graph  illustrated  in  figure  8.10.  Each  edge  in  this  graph  denotes  some  collection  of  perceptual 
primitives  which  hold  during  the  interval  spanned  by  that  edge.  Figures  8.11  and  8.12  enumerate  the 
perceptual  primitives  associated  with  each  edge  in  this  graph.’'* 

* ^ Inefficient  design  of  the  structure  of  the  current  implementation  requires  I{T.  J.  L)  to  be  called  independently  for  each 
object.  Remedying  that  inefficiency  should  dramatically  improve  the  performance  of  the  system. 

^^'For  the  same  reasons  as  mentioned  before,  the  current  implementation  must  call  /  for  each  pair  of  objects,  thus 
requiring  4*  ti  calls.  To  mitigate  this  inefficiency  somewhat,  the  current  implementation  only  discenis  direct  support, 
i.e.  support  relations  between  objects  in  contcwrt  with  each  other.  Indirect  support  can  be  derived  b\‘  taking  the  tran.sitive 
closure  of  the  direct  .support  relation.  This  efficiency  improvement  could  be  combined  with  the  strateg>'  suggested  in  the 
text  whereby  I{^  —  flgures(-4),  J.L)  would  be  called  only  if  .4  was  in  contact  with  some  other  object . 

^^The  perceptual  primitives  are  predicates  w*hich  hold  of  objects.  As  far  as  Abigail  is  concerned,  objects  are  simply 
collections  of  figures.  To  make  the  output  more  readable,  however,  objects  eue  printed  using  notation  like  [JOHf].  This 
printed  notation  for  objects  is  derived  from  the  names  of  the  figures  comprising  the  object.  Recall  that  figures  are  given 
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Figure  8.10;  The  event  graph  depicting  the  temporad  structure  of  the  perceptual  primitives  recovered 
by  Abigail  after  processing  the  short  movie  from  figure  8.9.  Each  edge  denotes  some  collection  of 
perceptual  primitives  which  hold  during  the  interval  spanning  that  edge.  Figures  8.11  and  8.1.' 
enumerate  the  perceptual  primitive.s  associated  with  each  edge  in  this  graph.  The  edges  from  i 
to  15,  from  18  to  32,  from  36  to  49,  and  from  52  to  66  each  correspond  to  a  step  taken  by  John  while 
walking. 


In  addition  to  support,  contact,  and  attachment  relations,  the  set  of  perceptual  primitive.s  include.s 
expressions  for  depicting  various  kinds  of  motion,  as  well  as  the  location  of  objects  and  the  paths  followed 
by  objects  during  their  motion.  I  will  not  discuss  these  primitives  in  depth  as  they  are  tangential  to  the 
main  focus  of  this  thesis. 

At  a  high  level,  the  correspondence  between  this  event  graph  and  the  events  in  the  movie  are  in¬ 
tuitively  obvious.  In  the  movie,  John  takes  four  steps  while  continuously  moving.  The  event  graph 
also  depicts  four  sub-event  clusters  of  the  overall  motion  event.  Each  cluster  further  breaks  down  into 
a  transition  between  standing  on  both  feet,  to  moving  forward,  to  again  standing  on  both  feet.  Note 
particularly,  that  John  is  supported  in  those  situations  where  he  is  standing  on  both  feet,  namely 
frames  0,  16,  33.  34,  and  50,  and  not  otherwise.*^ 

While  this  event  graph  bears  a  global  resemblance  to  the  movie,  it  is  not  adequate  to  detect  walking 

names  of  the  form  (/  j-)  where  is  an  ‘intuitive’  object  name  given  to  the  figure  by  the  person  creating  the  movie  script, 

emd  /  is  an  analogous  ‘intuitive’  part  name.  The  printed  representation  [ci . Cn]  delineates  the  figures  which  comprise 

an  object  by  grouping  those  figures  into  components  c,  based  on  the  intuitive  figure  name  assigned  by  the  script  writer. 
If  c,  is  a  symbol  t  then  it  denotes  the  set  of  all  figures  in  the  image  named  ij  x'>  for  some  /.  If  c,  is  a  p>air  (/  x)  then 
it  denotes  the  single  figure  bearing  that  name.  If  c,  is  of  the  form  r- part  then  it  denotes  a  set  of  figures  in  the  image 
named  (/  x)  for  tmy  /,  where  the  set  contains  more  than  one  figure  but  not  edl  such  figures.  I  should  stress  that  Abigail 
does  not  use  such  annotations  for  anything  but  printing. 

’^An  astute  reader  may  wonder  why  John  doesn’t  fall  even  when  both  feet  are  on  the  ground,  given  that  his  knee  and 
thigh  joints  are  flexible.  The  retison  for  this  will  be  explained  in  section  9.4. 
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[0.0] (PLACE  [JOHI-part]  PLACE-0) 

[0,0] (SUPPORTED  [JOHM-part] ) 

[0,1] (PLACE  [(EYE  JOB!)]  PLACE-1) 

[1.67] (M0VIIG  [JOHM-part]) 

[2 . 2]  (ROTATIMG-COUMTEB-CLOCKWISE  [JOHI-part] ) 

[2 . 2]  (ROTATIHG  [JOHM-part] ) 

[2 . 15]  (MOVIIG-ROOT  [JOHM-part] ) 

[2.16] (TRAMSLATIMG  [(EYE  JOHM)]  PLACE-2) 

[2.16] (M0VIMG-R00T  [(EYE  JOHM)]) 

[2.16] (M0VIMG  [(EYE  JOHM)]) 

[2.67] (TRAMSLATIMG  [JOHI-part]  PLACE-11) 

[16.16]  (SUPPORTED  [JOHI-part]) 

[16. 17]  (PUCE  [(EYE  JOHM)]  PLACE-3) 

[18 . 18]  (ROTATIIG-COUITER-CLOCKWISE  [JOHM-part] ) 

[18. 18]  (ROTATIHG  [JOHM-part]) 

[18.32]  (MOVIMG-ROOT  [JOHM-part]) 

[18.32]  (TRAMSLATIMG  [(EYE  JOHM)]  PUCE-4) 

[18.32]  (MOVIMG-ROOT  [(EYE  JOHN)]) 

[18.32]  (MOVIHG  [(EYE  JOHM)]) 

[33.33]  (PLACE  [(EYE  JOHM)]  PLACE-6) 

[33 . 34]  ( SUPPORTED  [JOHM-part] ) 


Figure  8.11:  Part  I  of  the  perceptual  primitives  recovered  by  Abigail  after  processing  the  short 
movie  from  figure  8.9. 
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[34 . 34]  (FUPPIMG  [ JOHI-part]  ) 

[34 . 34]  (ROTATIWG-COUITER-CLOCKHISE  [JOHI-part] ) 

[34 . 34]  (ROTATIIG-CLQCKWISE  [JOHI-part] ) 

[34.34]  (ROTATIIG  [JOHI-part] ) 

[34 . 34]  (MOVIIG-ROOT  [JOHI-part] ) 

[34.34]  (TRAISLATIIG  [(EYE  JOHI)]  PLACE-6) 

[34.34]  (ROTITIIG-CQUITER-CLOCKWISE  [(EYE  JOHI)]) 

[34.34]  (ROTATIIG  [(EYE  JOHI)]) 

[34.34]  (MOVIIG-ROOT  [(EYE  JOHI)]) 

[34.34]  (MOVIIG  [(EYE  JOHI)]) 

[36.36]  (PLACE  [(EYE  JOHI)]  PLACE-7) 

[36 . 36]  (ROTATIIG-CLOCKWISE  [JOHI-part] ) 

[36 . 36]  (ROTATIIG  [JOHI-part] ) 

[36.49]  (MOVIIG-ROOT  [JOHI-part] ) 

[36.49]  (TRAISLATIIG  [(EYE  JOHI)]  PUCE-8) 

[36.49]  (MOVIIG-ROOT  [(EYE  JOHI)]) 

[36.49]  (MOVIIG  [(EYE  JOHN)]) 

[50 . 50]  (SUPPORTED  [JOHI-part] ) 

[50.61]  (PUCE  [(EYE  JOHI)]  PLACE-9) 

[62 , 52] (ROTATIIG-CLOCKWISE  [JOHI-part] ) 

[62 . 62]  (ROTATIIG  [JOHI-part] ) 

[52 . 66]  (MOVIIG-ROOT  [JOHI-part] ) 

[52.66]  (TRAISLATIIG  [(EYE  JOHI)]  PLACE-10) 

[52.66]  (MOVIIG-ROOT  [(EYE  JOHI)]) 

[52.66]  (MOVIIG  [(EYE  JOHI)]) 


Figure  8.12:  Part  II  of  the  perceptual  primitives  recovered  by  Abigail  after  processing  the  short 
movie  from  figure  8.9. 
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using  the  definition  given  in  chapter  7. 


(define  step  (x) 

(exists  (i  j  k  y) 

(emd  (during  i  (contacts  y  ground)) 

(during  j  (not  (contacts  y  ground))) 
(during  k  (contacts  y  ground)) 

(equal  y  (foot  x)) 

(=  (end  i)  (beginning  j)) 

(=  (end  j)  (beginning  k))))) 


(define  ualk  (x) 

(exists  (i) 

(and  (during  i  (repeat  (step  x))) 

(during  i  (move  x)) 

(during  i 
(exists  (y) 

(and  (equal  y  (foot  x)) 

(contacts  y  ground)))) 

(during  i 
(not  (exists  (y) 

(and  (equal  y  (foot  x)) 

(slide-against  y  ground)))))))) 


Two  major  things  are  missing.  First,  the  ground  must  be  reified  as  an  object  so  that  Abigail  can 
detect  the  changing  contact  relations  between  John's  feet  and  the  ground.  Second,  the  slide-zigainst 
primitive  must  be  implemented.  Future  work  will  address  these  two  issues  in  the  hope  that  .Abigail 
can  detect  the  occurrence  of  walking  events. 

Abigail  has  processed  a  sizable  portion  of  the  larger  movie  described  in  section  6.1 .  While  she  cannot 
yet  process  the  entire  movie  due  to  processing  time  limitations,  figure  8.13  depicts  an  event  graph 
produced  for  the  first  172  frames  of  that  movie.  Appendix  C  enumerates  the  perceptual  primitives 
eissociated  with  the  edges  in  that  graph.  Producing  this  event  graph  required  about  twelve  hours  of 
elapsed  time  on  a  Symbolics  computer.  Comparing  this  with  the  time  required  to  process 

the  shorter  movie  indicates  that  in  practice,  the  complexity  of  the  event  perception  procedure  depends 
heavily  on  the  number  of  figures  and  objects  in  the  image. 

I  will  not  discuss  Abigail's  analysis  of  the  longer  movie  in  depth  except  to  point  out  two  things. 
First,  one  major  event  that  takes  place  during  the  first  172  frames  is  John  picking  up  the  ball  off  the 
table.  The  perceptual  primitives  recovered  by  Abigail  form  a  solid  foundation  for  recognizing  this 
event.  Recall  the  definition  given  for  pick  up  in  chapter  7. 


'®The  unreasonable  amount  of  time  required  to  proces.s  the  longer  movie  significantly  hindered  the  progress  of  this 
research. 


CNJ  CM 


(  HA  PTEH  H.  E\  E\  T  FEH(  EPTIOS 


IS¬ 

IS 

16-17 

16 


63 — 
63 

61-62 

73 - 60 

72 

71 

67 

66—71 

65 

64 

59 - 70 

59—64 

51—56 

51 

49-50 

49 

35 - 46 

35 

33-34 

33 

—32 


60 


64 


-71 


65 


166 

164-165 

150 - 163 

150 

146-149 

134 - 147 

134 

133 

132 

131 

116 - 130 

116 

114-115 

100 - 113 

100 

96-99 

—97 


■171^ 


Figure  8.13;  The  event  graph  depicting  the  temporal  structure  of  the  perceptual  primitives  recovered 
by  Abigail  after  processing  the  first  172  frames  of  the  movie  discussed  in  section  6.1.  .Appendix  C 
enumerates  the  perceptual  primitives  a.ssociated  with  each  edge  in  this  graph. 
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(define  pick-up  (x  y) 

(exists  (i  j  z) 

(and  (during  i  (supported  y)) 

(during  i  (supports  z  y)) 

(during  i  (contacts  z  y)) 

(during  j  (move  (hand  x))) 

(during  j  (contacts  (hemd  x)  y)) 

(during  j  (attached  (hand  x)  y)) 

(during  j  (supports  x  y)) 

(during  j  (move  y)) 

(not  (equal  z  (hand  x))) 

(=  (end  i)  (beginning  j))))) 

If  we  take  i  to  be  the  interval  [0.65]  and  J  to  be  the  interval  [60.  71].  the  following  perceptual  primitives 
taken  from  appendix  C  correspond  very  closely  to  the  above  definition. 

(during  i  (supported  y)) 

[0,71] (SUPPORTED  [BALL-part] ) 

[0,71] (SUPPORTED  [(LIHE-SEGMEMT3  BALL)]) 

(during  i  (supports  z  y)) 

[0,65] (SUPPORTS  [TABLE  BOX-part]  [BALL-part]) 

[0,71]  (SUPPORTS  [BALL-part]  [(LIHE-SEGME1IT3  BALL)]) 

(during  i  (contacts  z  y)) 

[0,65] (CQHT ACTS  [TABLE  BOX-part]  [BALL-part]) 

(during  j  (supports  x  y)) 

[66,71] (SUPPORTS  [JOHH-part]  [BALL-part]) 

[66,71] (SUPPORTS  [(LIHE-SEGMEHT3  BALL)]  [BALL-part]) 

[66.71]  (SUPPORTS  [BALL-part  JOHR-part]  [(LIME-SEGMEHT3  BALL)]) 

(during  j  (move  y)) 

[66.71]  (TRAISLATIHG  [BALL-part]  PUCE-19) 

[66.71]  (MOVIBG-ROOT  [BALL-part]) 

[66.71]  (MOVIHG  [BALL-part] ) 

[66.71]  (TRAISLATIBG  [(LIHE-SEGMEIT3  BALL)]  PLACE-17) 

[66.71]  (MOVIBG-ROOT  [(LIBE-SEGMEBT3  BALL)]) 

[66.71]  (MOVIBG  [(LIBE-SEGMEBT3  BALL)]) 

Note  that  if  an  object  is  supported  (by  another  object)  for  an  interval,  say  [0.71].  then  it  is  supported 
for  every  subinterval  of  that  interval,  in  particular  [0.65].  Given  this.  Abigail  has  detected  almost  all 
of  the  prerequisites  to  recognize  a  pick  up  event.  The  only  primitives  not  recognized  are  the  following. 

(during  j  (move  (heuid  x))) 

(during  j  (contacts  (hand  x)  y)) 

(during  j  (attached  (hand  x)  y)) 

Abigail  has  in  fact  detected  these  prerequisites  as  well.  They  just  don't  appear  in  the  event  graph  from 
figure  8.13  as  that  graph  depicts  only  those  primitives  which  no  longer  hold  after  frame  172.  John's  hand 
continues  to  move  while  grasping  the  ball  well  beyond  frame  172.  The  above  primitives  will  become  part 
of  the  event  graph  when  these  actions  terminate. 
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A  puzzling  thing  happens  in  Abk;aIL's  analysis  of  this  movie.  ABKiAlL  decides  that  the  tal)le  is 
unsupported  in  frame  172.  This  is  indicated  by  the  fact  that  event  graph  contains  an  edge  from  frame  0 
through  frame  171  with  the  following  ])erceptual  primitives. 

[0,171] (SUPPORTED  [TABLE  BOX-part] ) 

[0,171] (SUPPORTED  [(BOTTOM  BOX)]) 

[0,171] (SUPPORTS  [TABLE  BOX-part]  [(BOTTOM  BOX;]) 

Inspection  of  the  movie,  however,  reveals  that  the  table  remains  supported  throughout  the  entire  movie. 
What  causes  Abigail  to  suddenly  decide  that  the  table  is  unsupported  in  frame  172?  Figure  M.N  depicts 
the  sequence  of  images  that  are  part  of  Abigail's  imagination  of  the  short-term  future  for  frame  172. 
In  this  sequence,  John  falls  over  as  he  is  unsupported.  In  doing  so,  the  ball  he  i,s  holding  knocks  against 
the  table.  While  .ABIGAIL  knows  that  John  is  on  a  different  layer  from  the  table,  to  allow  him  to  walk 
acro.ss  the  table  without  a  substantiality  violation,  she  also  knows  that  the  ball  is  on  the  same  layer  as 
the  table,  since  in  the  past,  the  table  supported  the  ball.  This  allows  John  to  raise  the  table  up  on  one 
leg  by  leaning  on  its  edge  with  the  ball.  Since  Abigail  determines  that  .something  is  unsupported  if  it 
moves  during  imagination,  she  decides  that  the  table  is  unsupported.  This  points  out  a  deficiency  in  the 
method  used  to  determine  support.  An  object  may  be  supported  even  though  an  unrelated  object  could 
knock  it  over  during  imagination.  Methods  to  alleviate  this  problem  are  beyond  the  scope  of  this  thesis. 

I  will  discuss  one  further  deficiency  in  Abigail's  mechanism  for  perceiving  suppori.  Recall  that 
.Abigail  determines  that  an  object  A  supports  an  object  fl  if  B  is  supported  but  loses  that  support 
when  .4  is  removed.  Figure  8.15  depicts  a  board  supported  by  three  tables.  Since  removing  each  table 
individually  will  not  cause  the  board  to  fall,  Abigail  would  erroneously  conclude  that  none  of  the  tables 
support  the  board.  This  flaw  is  easily  remedied  by  having  .Abigail  consider  all  sets  of  objects  .4  to  see 
if  B  falls  when  the  entire  set  is  removed.  If  so.  then  either  the  set  can  be  taken  as  collectively  supporting 
the  object,  or  support  can  be  attributed  to  each  member  of  the  set  individually. 

8.3  Experimental  Evidence 

As  discussed  previously,  a  major  assumption  underlying  the  design  of  .Abigail  is  that  people  continually 
imagine  the  short-term  future,  extrapolating  perhaps  a  second  or  two  into  the  future,  as  an  ordinary 
component  of  visual  perception.  Freyd  and  her  colleagues  have  conducted  a  long  series  of  experiments 
(Freyd  1983.  Freyd  and  Finke  1984,  Finke  and  Freyd  1985.  Freyd  and  Finke  1985,  Finke  et  al.  1989, 
Freyd  1987,  Freyd  and  Johnson  1987,  Kelly  and  Freyd  1987)  that  .sui)porl  this  view.  These  exi)eriments 
.share  a  common  paradigm  designed  to  demonstrate  memory  shift.  Subjects  are  shown  a  sequence  of 
images  which  depict  one  or  more  objects  in  motion.  They  are  then  shown  a  test  image  and  asked 
whether  the  objects  in  the  test  image  are  in  the  same  position  as  they  were  in  the  final  image  in  the 
pre-test  sequence.  .Sometimes  the  objects  are  indeed  in  the  same  position  and  tlie  correct  respon.se  is 
‘same'.  (4iher  times  however,  the  objects  are  displaced  along  the  direction  of  motion  implied  by  the 
pre-test  inrage  sequence,  in  either  a  forward  or  reverse  direction.  In  this  case  the  correct  response  is 
"different".  Subjects  uniformly  give  more  incorrect  responses  for  test  images  where  the  objects  were 
displaced  further  along  the  path  of  implied  motion  than  for  test  images  where  the  objects  were  displaced 
in  the  reverse  direction.  In  fact,  for  some  experiments,  subjects  were  more  likely  to  give  a  'same'  response 
for  a  slight  forward  displacement  than  for  an  image  without  any  displacement.  These  experiments  were 
repeated,  varying  a  number  of  parameters.  These  included  the  number  of  pre  test  images,  the  number 
of  moving  objects  in  the  image  sequence,  the  length  of  time  each  pre-test  or  test  image  was  displayed, 
the  length  of  time  between  the  display  of  each  pre-test  image  or  between  the  display  of  the  final  pre-test 
image  and  the  test  image,  and  whether  the  images  were  taken  from  real  photographs  or  were  computer¬ 
generated  abstractions  such  as  rotating  rectangles  or  moving  dots.  It  appears  that  subjects'  memory  of  an 
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Frame  172,  Imagination  Step  5  Frame  172,  Imagination  Step  11 


Figure  8.14:  The  sequence  of  images  produced  by  Abigail  while  imagining  the  short-term  future  of 
frame  272  from  the  movie  described  in  section  6.1.  Abigail  imagines  that  .John  will  fall  and  knock 
over  the  table.  Due  to  a  flaw  in  the  method  for  determining  support.  Abigail  concludes  that  the 
table  is  unsupported. 
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Figure  8.15:  Three  tables  collectively  supporting  a  board.  Abigail  will  currently  fail  to  determine 
that  the  tables  support  the  board  .since  the  board  will  not  fall  when  each  is  removed  individually. 


object's  position  is  shifted  reliably  as  a  result  of  an  object's  suggested  motion.  Freyd  and  her  colleagues 
attribute  this  memory  shift  to  what  they  call  a  mental  extrapolation  of  object  movetiient.  Through 
statistical  analysis  of  the  error  rates  and  reaction  times  for  the  various  experimental  tasks,  they  claim  to 
have  demonstrated,  among  other  things,  that  objects  move  progressively  during  extrapolation,  that  an 
object's  velocity  during  extrapolation  is  roughly  equivalent  to  its  final  velocity  implied  by  the  pre-test 
image  sequence,  that  it  takes  some  time  to  stop  the  extrapolation  process  and  the  amount  of  time  needed 
to  stop  the  extrapolation  process  is  proportional  to  an  object's  final  velocity  during  the  pre-test  image 
sequence.  They  call  this  latter  phenomenon  representational  nioinentuin  due  to  its  similarity  to  physical 
momentum. 

In  many  of  its  details,  the  extrapolation  process  uncovered  by  Freyd  and  her  colleagues  differs  from 
the  artificial  imagination  capacity  incorporated  into  Abigail.  As  I  will  describe  in  chapter  9,  Abigail's 
imagination  capacity  has  no  notion  of  velocity  or  momentum.  Nonetheless,  1  take  the  results  of  Freyd 
and  her  colleagues  as  strong  encouragement  that  the  approach  taken  in  this  thesis  is  on  the  right  track. 

In  more  recent  work,  Freyd  et  al.  (1988)  report  evidence  that  the  human  extrapolation  process 
represents  forces,  such  as  gravity,  in  addition  to  velocities.  Furthermore,  they  report  evidence  for  the 
representation  of  forces  in  equilibrium,  even  for  static  images.  In  particular,  their  experiments  show  that 
subjects  who  perceive  essentially  static  images  with  forces  in  equilibrium,  such  as  one  object  supporting 
another,  extrapolate  motion  on  the  part  of  the  objects  in  those  images  when  the  equilibrium  is  disturbed, 
as  when  the  source  of  support  is  removed.  This  is  more  in  line  with  Abigail's  imagination  capacity. 

The  experimental  paradigm  they  used  is  similar  to  that  used  for  the  memory  shift  experiments.  It  is 
depicted  in  figure  8.16.  Subjects  were  shown  a  pre-test  sequence  of  two  images  followed  by  a  test  image. 
The  first  image  in  the  pre-test  sequence  depicted  a  plant  supported  either  by  a  stand  or  by  a  hook.  The 
plant  appeared  next  to  a  window  to  allow  subjects  to  gauge  its  vertical  position.  The  second  image 
depicted  the  plant  unsupported,  with  the  .stand  or  hook  having  disappeared.  The  test  image  was  similar 
to  the  second  image  except  that  in  some  instances,  the  plant  was  displaced  upward  or  downward  from 
its  position  in  the  second  image.  Subjects  viewed  each  image  in  the  sequence  for  250ms,  with  a  250ms 
interval  between  images.  They  were  asked  to  determine  whether  the  test  image  depicted  the  plant  in  the 
same  position  as  the  second  image  or  whether  the  test  image  depicted  the  plant  in  a  different  position. 
Subjects  made  more  errors  determining  that  the  test  image  differed  from  the  second  image  when  the 
test  image  depicted  the  plant  in  a  lower  position  than  the  second  image  in  contrast  to  when  the  test 
image  depicted  the  plant  in  a  higher  position.  This  result  can  be  interpreted  as  indicating  that  subjects 
imagined  that  the  plant  fell  when  its  source  of  support  was  removed. 

Abigail  performs  an  analogous  extrapolation  when  determining  support  relationships.  She  contin¬ 
ually  performs  counterfactual  analyses  determining  that  an  object  is  supported  if  it  does  not  fall  during 
extrapolation.  A  second  experiment  reported  by  Freyd  et  al.  (1988)  indicates  that  humans  do  not  per¬ 
form  such  analyses  in  all  situations.  This  experiment  is  similar  to  the  first  experiment  except  that  the 
plant  was  also  unsupported  in  the  first  image,  i.e.  it  was  unsupported  throughout  the  image  sequence. 
The  image  sequence  is  depicted  in  figure  8.17.  In  this  experiment,  subjects  demonstrated  no  memory 
shift  and  thus  no  tendency  to  imagine  the  unsupported  plant  falling.  It  appears  that  a  change  in  support 
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Figure  8.16:  The  image  sequences  shown  to  subjects  as  part  of  an  experiment  to  demonstrate  that 
people  represent  forces  in  equilibrium  when  viewing  static  images.  Reprinted  from  Freyd  et  al.  (1988). 
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Figure  8.17:  The  image  sequences  shown  to  subjects  as  part  of  an  experiment  to  demonstrate  that 
people  don't  always  represent  forces  in  equilibrium  when  viewing  static  images.  Reprinted  from 
Freyd  et  al.  (1988). 

status  is  necessary  to  induce  the  imagined  falling.  Abigail’s  imagination  capacity  does  not  accurately 
reflect  this  last  result. 

To  summarize,  experiments  rep  )i‘ed  by  Freyd  and  her  colleagues  depict  an  active  perceptual  system, 
forming  the  basis  of  our  conceptual  system,  which  has  as  its  foundation  an  imagination  capacity  which 
encodes  naive  physical  knowdedge.  This  capacity  appears  to  be  in  place  from  very  early  infancy.  This 
view  is  most  eloquently  captured  by  the  following  excerpts  from  Freyd  et  al.  (1988). 

Much  of  what  people  encounter  in  everyday  life  is  static  from  their  point  of  reference:  Cups 
rest  on  desks,  chairs  sit  on  floors,  and  books  stand  on  shelves.  Perhaps  it  is  the  very  perva¬ 
siveness  of  static  objects  and  still  scenes  that  has  been  responsible  for  psychology  ’s  historical 
focus  on  the  perception  of  static  qualities  of  the  world;  shape  and  form  perception,  pattern 
recognition,  picture  perception,  and  object  recognition.  In  apparent  contrast  to  this  focus, 
there  has  been  an  increasingly  popular  emphasis  on  the  perception  of  events,  or  patterns 
of  change  in  the  world.  There  is  a  sense,  however,  in  which  the  study  of  event  perception 
(e.g.,  J.  J.  Gibson.  1979)  has  shared  some  assumptions  with  the  more  traditional  focus  on 
the  perception  of  static  stimuli.  In  both  approaches  event  and  dynamic  stimuli  have  been 
defined  in  terms  of  changes  taking  place  in  real  time,  whereas  scenes  that  are  not  changing 
in  real  time  (or  are  being  viewed  by  an  observer  who  is  not  moving  in  real  time)  have  been 
considered  simply  static,  that  is,  specifically  not  dynamic. 

This  view  of  static  objects  and  scenes  suggests  that  the  perception  of  a  static  scene  is  devoid  of 
information  about  dynamic  qualities  of  the  world  (which  led  J.  J.  Gibson,  1970,  for  instance, 
to  consider  the  perception  of  static  scenes  to  be  a  mere  laboratory  curiosity).  But  if  we  take 
dynamic  to  mean  relating  to  physical  force  acting  on  objects  with  mass,  then  this  view  is 
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incorrect. 

[|).  empha.sis  in  tlie  original] 

Having  some  sort  of  access  to  likely  transformations  hy  representing  physical  forces  may  helj) 
solve  a  slightly  different  problem  in  object  recognition:  the  problem  of  correctly  identifying 
a  particular  instantiation,  or  "token.''  as  a  member  of  a  larger  class,  or  type,  or  object.  If 
part  of  what  one  stores  in  memory  about  an  object  type  is  aspects  of  its  likely  beliavior  when 
embedded  in  events,  then  representing  physical  forces  operating  on  objects  in  a  particular 
perceptual  situation  may  help  in  the  process  of  identification  of  a  particular  object  token. 

[p.  m] 

Of  course,  to  go  correctly  from  visual  input  to  a  representation  of  forces,  the  underlying 
representation  system  has  to  "know”  something  about  physical  forces  and  how  they  interact 
with  objects  for  a  particular  environment,  such  as  the  environment  encountered  on  the  surface 
of  the  planet  Earth.  Such  knowledge  may  be  a  function  of  the  inherited  or  e.Kperientially 
modified  representational  structure  serving  perception. 

[p.  lOO] 

Indeed,  our  view  suggests  that  when  people  are  viewing  a  static  scene,  lurking  behind  the 
surface  of  consciousness  is  an  inherently  dynamic  tension  resulting  from  the  representation 
of  forces  in  equilibrium.  We  see  tliis  dynamic  tension  as  contributing  to  the  conscious  expe¬ 
rience  of  concreteness  in  perception  and  to  the  memory  asymmetries  we  measure  when  the 
equilibrium  is  disrupted. 

[p.  407] 

Perhaps  we  might  also  be  able  to  determine  w-hether  the  pre.sent  findings  generalize  to  phys¬ 
ical  situations  beyond  gravity,  such  as  those  where  pressure  (or  even  electromagnetic  force) 
dominates.  However,  we  suspect  that  gravity  is  a  better  candidate  for  mental  "internaliza¬ 
tion’'  than  other  forces.  Shepard  (1981.  1984)  has  argued  that  the  mind  has  internalized 
characteristics  of  the  world  that  have  been  most  pervasive  and  enduring  throughout  evolu¬ 
tion.  Although  Shepard's  ( 1981,  1984)  list  has  emphasized  kinematic,  as  opposed  to  dynamic, 
transformations,  the  dynamic  aspects  of  gravity  are  indeed  pervasive  and  enduring  charac¬ 
teristics  of  the  world. 

[p.  40o] 

Although  some  might  accept  that  the  force  of  gravity  and  its  simple  opposing  forces  (Ex¬ 
periments  1-3)  could  be  represented  within  the  perceptual  system,  many  might  argue  that 
the  representation  of  forces  active  in  springs  (Experiment  4)  implicates  real-world  learning 
and  thus  suggests  that  the  basis  of  the  effect  is  more  central  than  perceptual.  We  suggest 
two  responses  to  this  argument:  First,  perceptual  knowledge  of  springlike  behavior  may  be 
innately  given  and  not  dependent  on  learning;  second,  evidence  of  perceptual  learning  is 
not  necessarily  evidence  against  modularity.  For  both  of  these  responses,  we  question  the 
assumption  that  the  effect  in  Experiment  4  stems  from  knowledge  of  springs  per  se.  It 
might  instead  reflect  perceptual  knowledge  of  compressible  and  elastic  substances,  of  which 
springs  are  an  example.  DiSe.ssa  (1983)  suggested  that  springiness  is  a  phenomenological 
primitive.  E.  J.  Gibson,  Owsley,  Walker,  and  Megaw-Nyce  (1979)  found  that  3-month-old 
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infants  extract  object  rigidity  or  nonrigidity  from  motion,  suggesting  tliat  people  distinguisli 
compressible  from  noncompressible  substances  at  a  very  early  age. 

[p.  -1U6]*' 

This  thesis  adopts  that  above  view  and  takes  it  as  motivation  for  the  design  of  Abigail's  perceptual 
system. 


8.4  Summary 

In  chapter  7,  I  argued  that  the  notions  of  support,  contact,  and  attachment  play  a  central  role  in  the 
definitions  of  simple  spatial  motion  verbs  such  as  ihrow,  pick  up.  pul.  and  walk.  In  this  chapter.  1 
presented  a  theory  of  how  these  notions  can  be  grounded  in  perception  via  counterfactual  simulation. 
An  object  is  supported  if  it  doesn't  fall  when  the  short-term  future  is  imagined.  One  object  supports 
another  object  if  the  second  is  supported,  but  loses  that  support  in  a  world  imagined  without  the  first 
object.  Two  objects  are  attached  if  such  attachment  is  needed  to  explain  the  fact  that  one  supports  the 
other.  Likewise,  two  objects  must  be  in  contact  if  one  supports  the  other.  A  simple  formulation  of  this 
theory  has  been  implemented  as  a  computer  program  called  Abigail  that  watches  movies  constructed 
out  of  line  segments  and  circles  and  produces  descriptions  of  the  objects  and  events  depicted  in  those 
movies.  The  events  are  characterized  by  the  changing  status  of  support,  contact,  and  attachment 
relations  between  objects.  This  chapter  has  illustrated  how  such  relations  could  be  recovered  by  using  a 
modular  imagination  capacity  to  perform  the  counterfactual  simulations.  The  next  chapter  will  discu.ss 
the  inner  workings  of  this  imagination  capacity  in  greater  detail. 


''Experiments  1  and  2  correspond  to  figirres  8.16  ^md  8.17  respectively.  Experiment  3  extends  experiments  1  and  2 
in  testing  for  representation  of  gravitational  forces.  Experiment  4  uses  a  similar  experimental  setup  to  test  for  the 
representation  of  forces  in  a  compressed  spring  as  weights  are  placed  on  top  of  the  spring  and  removed  from  it . 


Chapter  9 

Naive  Physics 


Much  of  Abigail’s  event  perception  mechanism,  and  ultimately  the  lexical  semantic  representation  site 
uses  to  support  language  acquisition,  relies  on  her  capacity  for  imagining  what  will  happen  next  in  the 
movie.  This  imagination  capacity  is  used  as  part  of  a  continual  counterfactual  ‘what  if  analysi.s  to 
support  most  of  event  perception.  For  example,  Abigail  infers  that  two  figures  are  joined  if  one  would 
fall  away  from  the  other  were  they  not  joined.  Knowing  which  figures  are  joined  allows  her  to  segment 
the  image  into  objects  comprising  sets  of  figures  that  are  joined  together.  This  ultimately  allows  the 
grounding  of  the  lexical  semantic  primitives  (attached  Jt  y)  and  (in-existence  x).  Furthermore, 
imagination  plays  a  role  in  determining  support  relationships.  Abigail  infers  that  two  figures  are  on 
the  same  layer  if  one  would  fall  through  the  other  were  they  not  on  the  same  layer.  This  is  required 
to  ground  the  lexical  semantic  primitive  (contacts  x  y).  Knowing  that  two  figures  are  on  the  same 
layer  allows  her  to  determine  that  one  object  supports  another  if  the  second  would  fall  were  the  first 
object  removed.  This  ultimately  allows  the  grounding  of  the  lexical  semantic  primitives  (supports  x  y) 
and  (supported  x). 

Abigail's  imagination  capacity  is  embodied  in  a  simulator  which  predicts  how  a  set  of  figures  will 
behave  under  the  influence  of  gravity.  Gravity  will  cause  the  figures  to  move  subject  to  several  constraints. 

joint  constraints:  Figures  that  are  joined  must  remain  joined.  The  values  of  rigid  joint  parameters 
must  be  preserved. 

substantiality:  Two  figures  which  are  on  the  same  layer  must  not  overlap, 
ground  plane:  No  figure  can  overlap  the  line  y  =  0. 

Furthermore,  each  of  these  constraints  is  subject  to  the  notion  of  continuity.  Not  only  must  all  figures 
uphold  the  joint,  substantiality,  and  ground  plane  constraints  in  their  final  resting  position,  they  must 
uphold  these  constraints  continuously  at  all  points  along  their  path  of  motion.  Figure  8.8  on  page  141. 
gives  an  example  of  Abigail's  imagination  capacity  in  operation. 

The  problem  of  simulating  the  behavior  of  a  set  of  components  under  the  influence  of  forces  subject  to 
constraints  is  not  new.  Much  work  on  this  problem  has  been  done  in  the  field  of  mechanical  engineering 
and  robotics  where  this  problem  is  called  kinematic  simulation  of  mechanisms.  The  classical  approach 
to  kinematic  simulation  uses  numerical  integration.*  Essentially,  it  is  treated  as  an  n-body  problem 
subject  to  constraints.  Since  the  constraints  are  typically  complex,  it  is  difficult  to  derive  an  analytic, 
closed-form  method  of  preserving  constraints  during  integration.  Accordingly,  the  common  approach 

'Two  notable  exceptions  to  this  are  the  work  of  Kramer  (1990a,  1990b)  and  Funt  (1980).  I  will  discuss  this  work  in 
section  10.1 
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is  to  integrate  using  a  small  step  size  ami  repeatedly  check  for  constraint  violations.  Preventing  con¬ 
straint  violations  is  often  accomplished  hy  modeling  them  as  additional  force's  acting  on  the  components. 
Cremer's  thesis  (1989)  is  an  example  of  recent  work  in  kinematic  simulation  usitig  numerical  integration. 

The  classical  approach  to  kitiematic  simulation  has  certain  merits.  I’p  to  the  limits  of  numerical 
accuracy,  it  faithfully  models  the  Newtonian  physics  of  a  mechanism.  I'his  includes  the  velocity,  ime 
mentum.  and  kinetic  energy  of  its  cotiiponents  as  well  as  the  magnitude  of  forces  collectively  acting 
on  each  component.  It  can  handle  arbitrary  forces  as  well  as  arbitrary  motion  constraints.  Except 
where  numerical  methods  break  down  at  singularities,  it  accurately  predicts  the  precise  motion  that 
components  undergo,  the  paths  they  follow,  and  their  final  resting  place  when  the  mechanism  reaches 
equilibrium. 

Vf’hile  this  cl^lssical  approach  to  kinematic  simulation  is  useful  it)  mechatiical  engineering,  it  is  le.ss 
suitable  as  a  cognitive  model  of  an  innate  imagination  capacity,  if  one  exists.  The  approach  is  both  too 
powerful  and  at  the  sante  time  too  weak.  On  one  hand,  people  are  not  able  to  accurately  predict  the 
precise  paths  taken  by  components  of  complex  mechanisms.  On  the  other  hand,  jieople  do  not  api)ear  to 
be  performing  numerical  integration  with  a  small  step  size.  Consider  the  mechanism  shown  in  figure  9.1. 
The  mechanism  consists  of  a  ball  attached  to  a  rod  which  is  joined  to  a  stand.  The  joint  is  flexible, 
allowing  the  rod  to  pivot  and  the  ball  to  fall  until  it  hits  the  table.  The  classical  approach  will  simulate 
such  a  mechanism  by  small  repeated  perturbations  of  the  joint  angle  After  each  [)eriurbation.  a 
constraint  check  is  performed  to  verify  that  the  ball  <loes  not  overlap  the  table.  There  is  .something 
unsatisfying  about  this  approach.  People  .seem  to  be  able  to  predict  that  the  rod  will  pivot  precisely  the 
amount  needed  to  bring  the  ball  into  contact  with  the  table. 

Using  a  small  but  nonzero  step  size  has  other  consequences  that  conflict  with  the  neinls  entailed  by 
using  a  kinematic  simulator  as  part  of  a  model  of  event  perception.  One  one  hand,  smaller  step  sizes  slow 
the  numerical  integration  process.  Current  kinematic  simulators  typically  operate  two  to  three  orders  of 
magnitude  slower  than  real  time.  Event  perception  however,  must  perform  numerous  simulations  fier 
frame  to  support  counterfactual  analysis.  As  discussed  in  chapter  8,  to  determine  support  relationships 
alone,  a  simulation  must  be  performed  for  each  pair  of  objects  in  the  image  to  determine  whether  one 
object  falls  when  the  other  object  is  removed.  To  be  cognitively  plausible,  or  at  least  computationally 
useful  for  event  perception,  the  simulator  incorporated  into  the  imagination  capacity  must  opera; t“  two 
to  three  orders  of  magnitude  faster  than  real  time,  not  slower.  Admittedly,  the  current  implementation 
is  nowhere  near  that  fast.  Nonetheless,  it  does  perform  hundreds  if  not  thousands  of  simulations  during 
the  five  to  ten  minutes  it  takes  to  process  each  movie  frame. 

Using  a  large  step  size  to  speed  up  the  clcissical  approach  is  likewise  cognitively  implausible,  barge 
step)  sizes  raise  the  possibility  of  continuity  violations.  The  configurations  before  and  after  an  integration 
step  may  both  satisfy  all  of  the  constraints  yet  there  may  be  no  continuous  path  for  the  components  to 
take  to  achieve  that  perturbation  which  does  not  violate  some  constraint.  For  example,  if  the  ball  in 
figure  9.1  was  smaller  and  the  step  size  was  larger  than  the  diameter  of  the  ball,  a  classical  simulator 
could  err  and  predict  that  the  ball  would  fall  through  the  table.  While  in  normal  mechanical  engineering 
practice,  judicious  choice  of  step  size  prevents  such  errors  from  occurring,  there  is  something  unsatisfying 
about  using  the  clcissical  approach  cis  a  cognitive  model.  Irrespective  of  their  size,  people  seem  able  to 
uniformly  predict  that  objects  move  along  continuous  paths  until  obstructed  by  obstacles. 

The  kinematic  simulator  incorporated  into  Abigail  u.ses  very  different  methods  than  cla.ssical  sim¬ 
ulation  with  the  objective  of  being  both  more  faithful  as  a  cognitive  model  and  fast  enough  to  support 
event  perception.  It  is  motivated  by  the  desire  to  simulate  mechanisms  like  the  one  shown  in  9.1  in  a 
single  step  (which  it  in  fact  does).  To  do  so.  it  takes  the  cognitive  notions  of  substantiality,  continuity, 
gravity,  and  ground  plane  to  be  primary,  and  Newtonian  physical  accuracy  to  be  secondary.  To  simplify 
the  task  of  enforcing  the  cognitive  constraints.  Abigail's  imagination  capacity  ignores  many  aspects  of 
physical  reality  and  restricts  the  class  of  mechanisms  it  can  simulate.  First,  the  simulator  ignores  tin' 
velocity  of  objects.  This  implies  ignoring  the  effects  of  momentum  and  kinetic  energy  on  object  motion. 


Figure  9.1;  The  simulator  incorporated  into  Abigail's  imagination  capacity  can  predict  in  a  single 
step  that  the  joint  will  pivot  exactly  the  amount  needed  until  the  ball  lands  on  the  table.  Cla.ssical 
kinematic  simulators  based  on  numerical  integration  repeatedly  vary  the  angle  0  by  a  small  step 
size  until  the  ball  collides  with  the  table.  If  the  step  size  is  too  small  the  simulation  is  slow.  If  the 
step  size  is  too  large  the  collision  might  not  be  detected,  resulting  in  a  simulation  which  violates  the 
substantiality  and  continuity  constraints.  Abigail  never  produces  such  an  anomalous  prediction. 


160 


(  II A PTKR  9.  \A IVE  PH\  SI( S 


Rather  than  integratizig  accelerations  into  velocities  and  positions,  the  simulator  operate.s  as  an  opti¬ 
mizer,  simply  moving  objects  along  paths  which  reduce  their  potential  energy.  .Second,  for  the  most  part, 
the  simulator  ignores  the  magnitude  of  forces  acting  on  objects  vvhen  comi)uting  their  ])otential  energy. 
Objects  simply  move  when  forces  are  applied  to  them,  in  a  direction  which  decrea.ses  their  potential 
energy.  They  don't  move  any  faster  when  the  force  is  greater  nor  do  objects  necessarily  move  in  a  dir<'c- 
tion  which  offers  the  greatest  decrease  in  potential  energy.  Third,  the  simulator  considers  moving  only 
rigid  objects,  or  rigid  parts  of  objects,  along  linear  or  circular  paths,  one  at  a  time,  when  attempting  to 
reduce  their  potential  energy.  Any  mechani.sm  which  involves  either  motion  along  a  more  complex  path 
or  simultaneous  motion  of  multiple  objects  along  different  paths  cannot  be  correctly  simulated.  This 
precludes  simulating  mechanisms  with  closed-loop  kinematic  chains  -  While  these  limitations  make  this 
simulator  inappropriate  for  traditional  mechanical  engineering  tasks,  at  least  the  hrst  two  limitations 
are  inconsequential  for  the  task  of  modeling  the  use  of  imagination  to  support  event  perception.  The 
third  limitation  does,  however,  cause  some  problems.  These  will  be  discussed  in  section  9.f. 


9.1  Simulation  Framework 

Abigail  simulates  the  imagined  future  of  an  image  by  moving  sets  of  figures  from  that  image  along  linear 
and  circular  paths  which  reduce  the  potential  energy  of  the  set  of  moved  figures.  The  potential  energy 
of  a  set  of  figures  is  simply  the  sum  of  the  potential  energies  of  each  figure  in  that  .set.  The  potential 
energy  of  a  figure  /  is  taken  to  be  the  product  of  its  mass  m(/)  and  the  height  of  its  centt^r-of-mass  y(f). 

Abigail's  kinematic  simulator  is  a  function  l(T.  J,L,P)  which  takes  as  ini>ut.  a  set  of  figures  E . 
along  with  a  joint  model  .7,  a  layer  model  L.  and  a  predicate  P?  F^ach  figure'  f  €.  E  has  an  ob¬ 
served  position,  orientation,  shape,  and  size  as  derived  from  the  current  movie  frame.  From  this  input. 
I{,P,  .7,  L,  P)  calculates  a  series  of  imagined  positions  and  orientations  for  each  /  €  This  series  of  po¬ 
sitions  and  orientations  constitutes  the  motion  predicted  by  .Abigail  for  the  figures  under  the  influence  of 
gravity.  I  will  denote  the  imagined  positions  and  orientations  of  a  figure  /  as  Jripif)).  y{p{f)).  and  0{f) 
in  contra.st  to  the  observed  positions  Je{p{f)).  y{p(f)).  and  0(f).  1  similarly  extend  such  notation  to 
distances  A(p.q).  displacements  I>{p.f).  and  any  other  notion  ultimately  based  on  coordinates  of  figure 
points.  During  imagination.  Abigail  applies  the  predicate  P  to  the  imagined  positions  and  orientation 
of  the  figures  after  moving  each  group  of  figures.  If  Pf/")  ever  returns  true  then  the  simulation  is  halted 
and  /(JP,  .7.  i,P)  returns  true.  If  P(P^)  never  returns  true  and  the  simulation  reaches  a  state  where 
no  further  movement  is  po.ssible,  I(:F.-].L,P)  returns  false.  Thus  1(T .  J .  L.  P)  can  be  interpreted  as 
asking  whether  P  will  happen  imminently  in  the  current  situation. 

During  simulation,  Abigail  will  move  one  set  of  figures,  while  leaving  the  remaining  figures  station¬ 
ary.  The  set  of  moved  figures  will  be  called  the  foreground  while  the  stationary  figures  will  be  called  the 
background.  I  will  denote  the  set  of  foreground  figures  as  F  and  the  set  of  background  figures  as  G.  The 
sets  F  and  G  are  disjoint.  Their  union  con.stitutes  the  entire  set  of  figures  f  being  imagined.  This  might 
not  be  equivalent  to  the  entire  set  of  figures  in  the  current  movie  frame,  since  .AbicjaIL  often  imagines 
what  would  happen  if  certain  figures  were  missing,  as  is  the  case  when  she  tries  to  determine  whether 
one  object  supports  another  by  imagining  a  world  without  the  first  object.  Two  kinds  of  foreground 

^  A  closed-loop  kinematic  chain  is  a  set  of  components  {ci . c„  }  where  each  c,  except  Cn  is  joined  to  + 1  and  c„  i.s 

joined  back  to  cj . 

^  This  may  appear  to  be  circular  since  J,  L,  P)  takes  joint  and  layer  models  aks  input .  and  in  timi.  is  used  to  compute 
joint  and  layer  models  according  to  the  process  described  in  section  8.2.1 .  This  circularity'  is  broken  by  calling  /(P.  J.  L.  P) 
with  empty  joint  and  layer  models  initially  to  compute  the  first  joint  and  layer  models,  and  using  the  previous  models  at 
each  frame  to  compute  the  updated  models.  Surprisingly,  it  usually  takes  Abigail  only  a  single  frame  to  converge  to  the 
correct  models. 

^  Independent  of  the  simplifying  assumption  discussed  in  section  8.1,2,  during  imagination  the  shapes  and  sizes  of  figures 
must  remain  invariant  to  avoid  producing  degenerate  predictions. 
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motion  are  considered:  translating  F  along  a  linear  axis  whose  orientation  is  0.  and  rotating  /  ahoiil  a 
pivot  point  i>.  The  pivot  point  need  not  lie  on  any  figure  in  F.  In  fact  it  can  he  eiih<T  inside  or  outside 
the  bounding  area  of  F. 

I'he  simulator  operates  by  rei>eatetlly  choosing  some  foregroimd  F.  and  either  translating  t  along 
an  appropriate  axis  0.  or  rotating  F  about  an  appropriate  pivot  point  [i.  as  far  as  it  can.  so  long  as  the 
potential  energy  of  F  is  continually  decreased  and  the  substantiality,  gromnl  jilane.  and  joint  constraints 
are  not  violated.  It  terminates  when  it  cannot  find  some  foreground  it  can  move  to  decrease  its  potential 
energy.  At  each  step  of  the  simulation  there  may  be  several  potential  motions  which  could  each  reduce 
the  potential  energy.  For  the  most  part,  the  choice  of  which  one  to  take  is  somewhat  arbitrary,  though 
there  is  a  partial  ordering  bias  which  will  be  described  shortly. 

The  key  facet  of  this  simulation  algorithm  is  that  at  each  step,  the  foreground  is  translated  or  rotated 
as  far  as  possihlt  subject  to  the  requirements  that  potential  energy  continually  decrecise  and  constraints 
be  maintained.  Limiting  all  motion  to  be  linear  or  circular,  aiul  limiting  figure  shapes  to  be  line  segments 
and  circles,  allows  closed-form  analytic  determination  of  the  maximniii  movement  possible  duritig  that 
step.  Later  in  this  section,  I  will  discuss  this  fairly  complex  clo.sed-form  solution. 

At  each  simulation  step,  Abigail  must  choose  an  appropriate  foreground  F.  decide  whether  to 
translate  or  rotate  F,  and  choose  an  appropriate  axis  0  for  the  translation  or  jiivot  jioint  /<  for  the 
rotation.  Having  made  these  choices,  the  maximum  movement  c  is  analytically  determined.  Choosing 
the  type  of  movement  (F,  and  0  or  p),  however,  involves  search.  .Abigail  considers  the  following  six 
possibilities  in  order. 

Translating  an  object  downwards.  In  this  case  F  consists  of  a  set  of  figures  cotmected  by  joints 
and  0  =  — There  must  by  no  joint  between  any  foreground  and  background  figures.  I  hus  F 
must  be  a  connected  component  iti  the  connection  graph  whose  vertices  are  figures  and  edges  are 
joints  between  pairs  of  figures. 

Sliding  an  object  along  an  inclined  surface.  In  this  ca.se  F  consists  of  a  connected  componeni  in 
the  connection  graph  and  0  is  either  the  orientation  0(f).  or  the  ojtposite  orientation  0if)  -f  t, 
whichever  is  negative  when  normalized,  of  .some  line  segnteni  /  such  that  either 

1.  /  is  in  the  foreground  and  is  coincident  with  a  line  segment  y  in  the  background, 

2.  /  is  in  the  background  and  touches  a  line  segment  g  in  the  foregrouml  at  an  endpoint  of  g. 

3.  /  is  in  the  background  and  is  tangent  to  a  circle  g  in  the  foreground. 

4.  /  is  in  the  foreground  and  touches  a  line  segment  g  in  the  background  at  an  endpoint  of  g.  or 

•5.  /  is  in  the  foreground  and  is  tangent  to  a  circle  g  in  the  background 

as  long  as  f  (x  g.  No  other  translations  a.xes  need  be  considered  for  this  case.  Furthermore, 
neither  vertical  nor  horizontal  translation  axes  need  be  considered  since  vertical  translation  axes 
fall  under  the  previous  case,  and  horizontal  motion  will  never  reduce  the  potential  energy  of  an 
object.  Figures  9.2(a)  through  9.2(e)  depict  cases  1  through  o  respectively.  These  cases  may  at 
times  yield  multiple  potential  sliding  a.xes  for  a  given  foreground  as  demonstrated  in  figures  9.2(f) 
through  9.2(h).  In  figure  9.2(f)  these  degenerate  to  the  same  axis.  In  both  figures  9.2(g)  and  9.2(h) 
only  one  of  the  two  axes  allows  unblocked  movement.  In  general,  when  multiple  sliding  axes  are 
predicted,  they  will  either  be  degenerate,  or  all  but  one  will  be  blocked. 

An  object  falling  over.  In  this  case  F  consists  of  a  connected  component  in  the  connection  graph 
and  p  is  either 

1.  an  endpoint  of  a  line  segment  from  F  if  the  endpoint  lies  on  the  ground, 

2.  an  endpoint  of  a  line  segment  /  from  F  if  the  endpoint  lies  on  a  figure  g  from  (i  and  /  ixi  </• 
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Figure  9.2:  Determining  the  potential  axes  0  of  sliding.  A  foreground  might  slide  relative  to  a 
background  along  any  line  segment  from  the  foreground  which  either  is  coincident  with  some  line 
segment,  touches  the  endpoint  of  some  line  segment,  or  is  tangent  to  some  circle,  in  the  background, 
or  along  any  line  segment  from  the  background  with  an  analogous  relationship  to  a  figure  in  the 
foreground.  Other  axes,  including  the  orientations  of  unrelated  line  segments,  line  segments  which 
touch  other  figures  in  ways  other  than  those  specified  above,  or  line  segments  which  don't  touch 
across  the  foregrc  ind  and  background  boundary  need  not  be  considered. 
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ail  t*iHl|)oiiit  of  a  line  hegiiieiil  ij  from  (!  if  the  eiidjioinl  lies  on  a  Hgiire  /  friim  F  and  /  sxi  </. 

4.  the  center  of  a  circle  from  F  if  the  circle  toiiclies  the  ground. 

5.  the  center  of  a  circle  /  from  F  if  the  circle  touches  a  figure  y  from  (/  and  /  ixi  y.  or 

(j.  the  center  of  a  circle  y  from  C!  if  tlie  circle  touche.s  a  figure  /  from  F  and  /  [xi  y. 

.No  other  pivot  points  need  be  considered  for  this  case.  Figures  9.;i(a)  through  t)..'{(f)  depict  ca.ses  1 
through  (i  respectively. 

Varying  a  flexible  rotation  parameter  of  a  joint,  if  j  is  a  joint  with  a  flexible  rotation  parameter 
that  connects  two  parts  of  an  object  that  are  otherwise  unconnected  then  it  is  possible  to  rotate 
either  jiart  about  the  joint  pivot.  In  this  case  /-'  can  be  any  connected  component  in  the  connection 
graph  computed  without  j  such  that  F  contains  either  f{j)  or  y(j).  The  only  pivot  point  which 
need  be  considered  is  p(j),  the  point  where  the  two  figures  are  joined.  If  j  is  not  part  of  a  closed- 
loop  kinematic  chain  then  there  will  always  be  exactly  two  such  foregrounds  F .  one  for  each  subpart 
connected  by  j.  One  subpart  will  contain  f(j)  while  the  other  will  contain  y(j).  If  j  is  part  of 
a  closed-loop  kinematic  chain  then  there  will  be  a  single  such  foreground  F  containing  both  f{j) 
yij)-  Abigail  detects  this  case  and  simply  does  not  consider  rotating  about  flexible  joints  in 
closed-loop  kinematic  chains.  This  amounts  to  treating  all  closed-loop  kinematic  chains  as  rigid 
bodies. 

Varying  a  flexible  translational  displacement  parameter  of  a  joint.  If  j  is  a  joint  such  that 

is  flexible  and  f(j)  is  a  line  segment  then  it  is  possible  to  translate  either  part  connected  by  j 
along  f(j).  In  this  case  only  the  orientation  0{f{j)).  or  the  opposite  orientation  ff(/(j))  +  tr. 
need  be  considered  as-  possible  translation  axfs.  whichever  is  negative  when  normalised.  Likewi.se. 
if  is  flexible  and  y(j)  is  a  line  segment  then  it  is  possible  to  translate  either  part  connected 
by  j  along  g(j).  In  this  case  only  the  orientation  0{(j{j)).  or  the  opposite  '>rientation  0(y(j))  tr 
need  be  considered  as  passible  translation  axes,  whichever  is  negative  wlieii  normalized.  In  both 
cases,  the  translation  is  limited  to  the  distance  between  ;)(j)  and  the  ajipropriate  endpoint  of  the 
line  segment  along  which  the  translation  is  taken.  The  limits  imposed  by  this  constraint  are  com¬ 
puted  analytically  and  combined  with  the  limits  implied  by  the  substantiality  and  ground  plane 
constraints.  The  foreground  F  is  computed  in  the  same  way  as  for  the  aforementioned  case  of 
varying  a  flexible  rotation  parameter  and  is  limited  to  varying  joints  which  do  not  parlici|)at<-  in 
closed-loop  kinematic  chains. 

Varying  a  flexible  rotational  displacement  parameter  of  a  joint.  If  j  is  a  joint  such  that  f'f(j) 
is  flexible  and  f(j)  is  a  circle  then  it  is  possible  to  rotate  either  part  connected  by  j  about  the  center 
of  f(j).  In  this  case  the  only  pivot  point  that  need  be  consideretf  is  Likewise,  if  is 

flexible  and  y{j)  is  a  circle  then  it  is  possible  to  rotate  either  part  connected  by  j  about  the  center 
of  g(j).  In  this  case  the  only  pivot  point  that  need  be  considered  is  p(y(j)).  The  foreground  F  is 
computed  in  the  same  way  as  for  the  case  of  varying  a  rotation  parameter  and  is  limited  to  varying 
joints  which  do  not  participate  in  closed-loop  kinematic  chains. 

Currently,  only  the  first  four  cases  are  implentented.  Varying  displacement  parameters  of  joints  is  not 
implemented  though  it  is  not  conceptually  difficult  to  do  so. 

Having  chosen  a  foreground  F,  and  whether  to  translate  F  along  a  chosen  axis  0,  or  to  rotate  F 
about  a  chosen  pivot  point  p.  the  simulator  must  now  determine  c.  the  amount  of  the  translation  or 
rotation.  As  mentioned  previously,  the  simulator  will  always  translate  or  rotate  the  foreground  as  far 
as  it  will  go,  in  a  single  analytic  step,  until  one  of  two  conditions  occur;  either  further  translation  or 
rotation  will  no  longer  decrease  the  potential  energy  or  a  barrier  prevents  further  movement.  There  are 


164 


(  HA  PTEH  U.  \A  I \  E  PH  \  Sl(  .S' 


Figure  9. .'5:  Determining  the  potential  pivot  points  p  about  which  an  object  may  rotate  when  falling 
over.  When  falling  over,  an  object  can  pivot  only  about  a  point  touching  the  ground  or  another 
object.  No  other  pivot  points  need  be  considered. 
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two  kiiuls  of  harriers:  the  grouiul.  via  the  ground  plane  constraint,  and  another  lignre  on  the  same  layer, 
via  the  substantiality  constraint. 

Deterniining  when  further  translation  or  rotation  will  no  longer  decrease  the  potential  energy  is  easy. 
For  translation  along  an  axis  0.  there  is  no  limit.  So  long  as  the  axis  of  translation  0  is  negative  when 
normalized,  further  downward  translation  of  /■'  will  always  rlecrea.se  the  potential  energy  of  /  .  I  pward 
translations  where  0  is  positive  need  never  be  con.sidererl  sinc«‘  they  ran  only  incri'ase  the  jiotential 
energy.  Likewise,  horizontal  translations,  where  0  =  {)  or  0  =  ■jt  need  not  be  considereil  since  they  will 
not  affect  the  potential  energy.'*  For  rotation  about  a  pivot  point  i>.  the  ai)propriate  limit  is  the  rotation 
which  would  bring  the  center-of-nia,ss  of  F  directly  below  ]>.  This  rotation  can  be  calculated  as  follows. 
First  compute  the  center-of-mass  of  F  which  I  will  denote  as  p{F). 


!J(  }>({■')) 


E/ef 

E/6A-  »>(/).</(/) 

E/6F  >»(f) 


Then  compute  the  orientation  of  the  line  from  the  pivot  point  p  to  this  center-of-mass  /<(/■').  This 
is  0{p.i>(F)).  The  ilesired  rotation  limit  is  —  0{p,p{F)).  If  this  value  is  zero  when  normalized  then 
no  rotation  of  F  about  the  pivot  point  p  will  reduce  the  potential  energy  of  F.  so  such  a  rotation  is 
not  considered.  If  the  value  is  negative  when  normalized  then  only  a  clockwise  rotation  can  r<‘duc<' 
the  potential  energy  of  F.  If  the  value  is  ]>ositive  but  not  etpial  to  tt  wlnm  normalized  then  oidy  a 
counterclockwise  rotation  can  reduce  the  potential  energy  of  F.  If  the  vahn'  is  tt  when  normalized  tln'ii 
the  choice  of  rotation  direction  is  indeterminate  since  either  a  clockwise  or  counterclockwise  rotation  will 
reduce  the  potential  energy.  In  this  case  a  counterclockwise  rotation  is  chosen  arbitrarily.  Furthermore, 
if  the  [)ivot  point  p  is  coincident  with  the  center-of-ma.ss  p(F)  then  no  rotation  of  F  about  the  ))ivot 
point  p  will  reduce  the  potential  energy  of  /•',  so  again,  such  a  rotation  is  not  considereil.  Since  the 
magnitude  of  a  rotation  need  never  be  greater  than  tr  we  can  represent  clockwise  rotations  as  negative 
normalized  rotations  and  counterclockwise  rotations  as  positive  normalized  rotations. 


9.2  Translation  and  Rotation  Limits 

Determining  the  translation  and  rotation  limits  that  result  from  barriers  is  more  complex.  In  essence, 
the  following  procedures  are  needed. 

•  (aggregate-translation-limit  F  (1  0) 

•  (aggregate-clockwise-rotation-limit  F  (!  p) 

•  (aggregate-counterclockwise-rotation-limit  F  G  p) 

These  determine  the  maximum  translation  or  rotation  c  that  can  be  applied  to  a  foreground  F  until 
it  collides  with  either  the  ground  or  with  the  background  G.  Translating  or  rotating  a  foreground  F 
will  translate  or  rotate  each  figure  f  £  F  along  the  same  axis  0  or  about  the  same  pivot  point  p.  A 
foreground  F  can  be  translated  or  rotated  until  any  one  of  its  figures  /  6  T  is  either  blocked  by  the 
ground  or  by  some  figure  g  £  G  such  that  /  and  </  are  on  the  same  layer.'’  Being  blocked  by  tin' 

''The  reason  angles  are  nonnalized  s«j  that  a  leftward  orientation  is  +  r  and  not  —  r  is  st)  that  only  downward  translat ion 
axes  are  negative. 

^'Hecall  that  Abigail  assumes  that  two  figures  are  <>n  different  layers  unless  she  has  explicit  rea.son  to  believe  that  they 
are  on  tJie  same  layer. 
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ground,  i.e.  the  ground  plane  constraint,  can  he  handled  as  a  \ariafion  of  the  suhstantialilx  con.straint 
by  temporarily  treating  the  ground  a.s  a  sufficiently  long  line  .segment  that  i.s  on  the  same  layer  as 
every  figure  in  the  foreground.  Thus  the  above  procedures  which  compute  movement  limits  for  a  whole 
foreground  can  be  implemented  in  terms  of  procedures  which  compiue  limits  for  individual  figures  via 
the  following  template.' 

(defun  aggregate-/j/ic-limit  (F  (I  0) 

(iterate  outer 

(for  /  in  F) 

(minimize  (f^/pf-limit  /  ^ground*  6)) 

(iterate  (for  g  in  (V) 

(when  (same-layer?  /  g) 

(in  outer  (minimize  (/y/x-limit  /  g  0))))))) 

where  igpt  is  either  translation,  clockwise-rotation  or  counterclockwise-rotation,  lb  iinjile- 
ment  the  functions 

•  translation-limit, 

•  clockwise-rotation-limit,  and 

•  counterclockwise-rotation-limit 

which  compute  movement  limits  for  individual  figures,  eight  major  cases  must  be  considere<l. 

1.  Translating  a  line  segment  /  until  blocked  by  a  another  line  segment  g. 

*2.  Translating  a  circle  /  until  blocked  by  a  line  segment  g. 

3.  Translating  a  line  segment  /  until  blocked  by  a  circle  g. 

4.  Translating  a  circle  /  until  blocked  by  another  circle  g. 

5.  Rotating  a  line  segment  /  until  blocked  by  a  another  line  segment  g. 

6.  Rotating  a  circle  /  until  blocked  by  a  line  segment  g. 

7.  Rotating  a  line  segment  /  until  blocked  by  a  circle  g. 

8.  Rotating  a  circle  /  until  blocked  by  another  circle  g. 

Each  of  these  eight  ctises  contains  a  number  of  subccuses.  Many  of  these  cases  and  subcases  compute  the 
amount  that  /  may  move  until  blocked  by  g  by  instead  computing  the  amount  that  g  may  move  in  the 
opposite  direction  until  blocked  by  /.  Translations  in  the  opposite  direction  involve  a  translation  axis 
whose  orientation  is  0  +  tr  instead  of  $.  Rotations  in  the  opposite  direction  return  clockwi.se  limits  as 
counterclockwise  ones  and  vice  versa.  1  will  now  consider  each  of  the.se  eight  major  rases  individually, 
along  with  their  subcases. 

Translating  a  line  segment  /  until  blocked  by  another  line  segment  g. 

This  Cctse  contains  four  subcases,  all  of  which  must  be  considered.  The  tightest  limit  returned  by 
any  of  the  subcases  is  the  limit  returned  by  this  case. 

'This  code  fragment  uses  the  iterate  macro  introduced  by  .Amsterdam  (1990). 
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Figure  9.4:  Translating  a  line  segment  /  until  its  endpoint  p(f)  touches  another  line  segment  g. 


Translating  /  until  its  endpoint  p(f)  touches  g. 

This  subcase  is  depicted  in  figure  9.4.  Project  a  ray  r  from  p(f)  along  the  axis  0.  This  ray 
will  be  called  a  translation  ray.  If  r  does  not  intersect  g  then  this  subcase  does  not  limit 
the  translation  of  f  along  the  axis  &.  However,  if  r  does  intersects  g  at  p\  then  the  distance 
from  p(f)  to  Pi  is  a  limit  on  the  translation  of  F  along  the  axis  B.  The  position  of  /  after  the 
translation  is  depicted  as  f\  in  figure  9.4. 

This  case  has  a  boundary  case  to  consider  when  the  translation  ray  r  intersects  g  at  one  of 
its  endpoints.  If  r  intersects  p(g)  then  g  limits  the  translation  of  /  only  when  1^(/)  —  <5(9)1  < 
f  when  normalized.  Likewise,  if  r  intersects  g(g)  then  g  limits  the  translation  of  /  only 
when  \0(f)  —  0(q(g),p(g))\  <  f  when  normalized.  These  boundary  cases  are  illustrated  in 
figure  9.5.  In  figure  9.5,  the  endpoint  p(g)  of  line  segment  g  limits  the  translation  of  /  but 
not  the  translation  of  /'. 

Translating  /  until  its  endpoint  q{f)  touches  g. 

This  case  is  analogous  to  the  first  subcase  except  that  the  translation  ray  is  projected  from  q(f) 
instead  of  p(f). 

Translating  /  until  it  touches  the  endpoint  pig). 

This  case  reduces  to  the  first  subcase  by  translating  g  in  the  opposite  direction  O  +  tt  until  p(g) 
touches  /. 

Translating  /  until  it  touches  the  endpoint  q{g). 

This  case  reduces  to  the  second  subcase  by  translating  g  in  the  op])osite  direction  0  +  tt 
until  qig)  touches  /. 

Translating  a  circle  /  until  blocked  by  a  line  segment  g. 

This  case  contains  three  subcaises,  all  of  which  must  be  considered.  The  tightest  limit  returned  by 
any  of  the  subcases  is  the  limit  returned  by  this  case. 

Translating  /  until  it  is  tangent  to  g. 

This  subcase  is  depicted  in  figure  9.6.  Construct  two  lines  segments,  gj  and  92,  parallel  to 
and  on  either  side  of  the  line  segment  9,  separated  from  9  by  a  distance  equal  to  the  radiu.s 
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Figure  9.6;  Traaslating  a  circle  /  until  it  is  tangent  to  a  line  segment  y. 


of  the  circle  /.  The  endpoints  of  g\  and  j/o  are  those  that  re.siilt  from  moving  the  endpoints 
of  g  a  distance  equal  to  the  radius  of  /  along  axes  which  are  perpendicular  to  y.  The  line 
segments  gi  and  (/o  are  the  potential  loci  of  the  center  of  the  circle  /  if  it  were  tangent  to  y. 
Project  a  translation  ray  r  from  the  center  p(f)  of  the  circle  along  the  axis  0.  If  r  does 
not  intersect  either  or  go  then  this  subcase  does  not  hin  t  the  translation  of  F  along  the 
axis  0.  However,  if  r  does  intersect  gi  at  pi  then  the  distance  from  p(f)  to  p\  is  a  lijiiil  on  the 
translation  of  F  along  the  axis  6.  Likewise,  if  r  intersects  gn  at  p^  then  the  distance  from  p(f) 
to  pn  is  a  limit  on  the  translation  of  F  along  the  axis  6.  The  position  of  f  after  the  translation 
is  depicted  as  fi  in  figure  9.6. 

Translating  /  until  it  touches  the  endpoint  p(g). 

This  subcase  reduces  to  the  second  subcase  of  the  next  case  by  translating  the  line  segment  g 
in  the  opposite  direction  0  +  w  until  its  endpoint  p{g)  touches  the  circle  /. 

Translating  /  until  it  touches  the  endpoint  q(g). 

This  subcase  reduces  to  the  third  subcase  of  the  next  case  by  tran.s)ating  the  line  segment  y 
in  the  opposite  direction  0  +  ir  until  its  endpoint  g(g)  touches  the  circle  /. 

Translating  a  line  segment  /  until  blocked  by  a  circle  g. 

This  case  contains  three  subcases,  all  of  which  mu.st  be  considered.  The  tightest  limit  returned  by 

any  of  the  subcases  is  the  limit  returned  by  this  case. 

Translating  /  until  it  is  tangent  to  g. 

This  subcase  reduces  to  the  first  subcase  of  the  previous  case  by  translating  the  circle  g  in 
the  opposite  direction  ^  +  rr  until  it  is  tangent  to  the  line  segment  /. 
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Figure  9.7;  Translating  a  line  segment  /  until  its  endpoint  p(/)  touches  a  circle  g. 

Translating  /  until  its  endpoint  p(f)  touches  g. 

This  subcase  is  depicted  in  figures  9.7  and  9.8.  Project  a  tr3n.slation  ray  r  from  the  end¬ 
point  p{f)  along  the  axis  6.  If  r  does  not  intersect  the  circle  g  then  this  subcase  does  not 
limit  the  translation  of  F  along  the  axis  9.  However,  if  r  does  intersect  g  at  one  point  p^.  as 
it  does  in  figure  9.7,  then  the  distance  from  p{f)  to  pi  is  a  limit  on  the  translation  of  F  along 
the  axis  9.  If  r  intersects  g  at  two  points  pi  and  p^,  as  it  does  in  figure  9.8,  then  the  shorter 
of  the  distances  from  p(/)  to  pi  and  from  p(/)  to  p^  is  a  limit  on  the  translation  of  F  along 
the  axis  9.  The  position  of  /  after  the  translation  is  depicted  as  /)  in  figures  9.7  and  9.8. 

Translating  /  until  its  endpoint  q{f)  touches  g. 

This  subcase  is  analogous  to  the  second  subcase  except  that  the  translation  ray  is  projected 
from  q{f)  instead  ofp(/). 

Translating  a  circle  /  until  blocked  by  another  circle  g. 

This  case  contains  three  disjoint  subcases.  The  applicable  subcase  can  be  determined  analytically 
by  examining  the  centers  and  radii  of  the  circles  /  and  (/.** 


*In  the  anomalous  situation  where  /  and  g  are  equiradial  and  concentric  either  the  second  or  the  third  case  cein  be 
used. 
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Figure  9.9:  Translating  a  circle  /  until  blocked  by  another  circle  g  when  /  and  g  are  outside  each 
other. 

The  circles  are  outside  each  other. 

This  subcase  is  depicted  in  figure  9.9.  In  this  subcase  the  circle  /  is  translated  until  it  is 
tangent  to  and  outside  the  circle  g.  Construct  a  circle  yi,  concentric  with  g.  whose  radius  is 
the  sum  of  the  radii  of  /  and  g.  Project  a  translation  ray  r  from  the  center  p{f)  of  /  along 
the  axis  6.  If  r  does  not  intersect  gi  then  this  subcase  does  not  limit  the  translation  of  F 
along  the  axis  0.  However,  if  r  does  intersect  gi  then  it  will  do  so  at  two  points,  pi  and  />2, 
which  may  degenerate  to  the  same  point.  The  shorter  of  the  distances  from  p(f)  to  pi  and 
from  p(/)  to  p2  is  a  limit  on  the  translation  of  F  along  the  axis  0.  The  position  of  /  after  the 
translation  is  depicted  as  /i  in  figure  9.9. 

The  circle  /  is  inside  g. 

This  subcase  is  depicted  in  figure  9.10.  In  this  subcase  the  circle  /  is  translated  until  it  is 
tangent  to  and  inside  the  circle  g.  Construct  a  circle  gi ,  concentric  with  g.  whose  radius  is  the 
radius  of  g  minus  the  radius  of  /.  Project  a  translation  ray  r  from  the  center  p(f)  of  /  along 
the  axis  0.  Note  that  r  must  intersect  gi  at  a  single  point  pi.  The  distance  from  ;>(/)  to  pj 
is  a  limit  on  the  translation  of  F  along  the  axis  9.  The  position  of  /  after  the  translation  is 
depicted  as  /i  in  figure  9.10. 

The  circle  g  is  inside  /. 

This  subcase  reduces  to  the  second  subcase  by  translating  g  in  the  opposite  direction  +  tt 
until  blocked  by  /. 

Rotating  a  line  segment  /  until  blocked  by  another  line  segment  g. 

This  case  contains  four  subcases,  all  of  which  must  be  considered.  The  tightest  limit  returned  by 


4 


CHAPTER  y  \AI\  E  PHYSICS 


Figure  9.11:  Rotating  a  line  segment  /  until  its  endpoint  p(/)  touches  another  line  segiiieut  y. 


any  of  the  subcases  is  the  limit  returned  by  this  case. 

Rotating  /  until  its  endpoint  p(f)  touches  g. 

This  subcase  is  depicted  in  figure  9.11.  Construct  a  circle  r  whose  center  is  the  pivot  point  p 
and  whose  radius  is  the  distance  from  p  to  the  endpoint  ;>(/)  of  line  segment  /.  This  circle 
will  be  called  a  pivot  circle.  If  c  does  not  intersect  line  segment  g  then  this  subcase  does 
not  limit  the  rotation  of  F  about  the  pivot  point  p.  However,  if  c  does  intersect  g  at  a 
single  point  pi  then  0{p,p(f))  —  6{p,p\)  is  a  limit  on  the  clockwise  rotation  of  F  about  the 
pivot  point  p  while  9(p.py)  —  0(p.p{f))  is  the  corresponding  limit  in  the  counterclockwise 
direction.  If  c  intersects  g  at  two  points  pi  and  p2  then  the  larger  of  0{p,p(f))  —  O(p.pi) 
and  6(p,p(f})  —  6(p,pn)  is  a  limit  on  clockwise  rotation  while  the  larger  of  0(p.pi  )  —  0(p,p(f)) 
and  0(p,p2)  —  6{p.p(f))  is  the  corresponding  limit  in  the  counterclockwise  direction.  The 
position  of  f  after  the  maximal  clockwise  rotation  is  depicted  as  fi  in  figure  9.11.  Ignoring 
limits  introduced  by  other  subccises,  the  position  of  /  after  the  maximal  counterclockwise 
rotation  is  depicted  as  /t  in  figure  9.11. 

This  case  has  a  boundary  case  to  consider  when  the  pivot  circle  c  intersects  g  at  one  of  its 
endpoints.  If  either  p:  or  po  in  the  above  discussion  is  an  endpoint  of  g  then  that  point  is 
considered  as  an  intersection  of  c  with  g.  for  the  purposes  of  lintiting  the  rotation  of  /  only 
if  |^(/)  — ^(p(/).p)l  <  ^  when  normalized.  This  boundary  case  is  illustrated  in  figure  9.12.  In 
figure  9.12,  the  endpoint  p{g)  of  line  segment  g  limits  the  rotation  of  /  but  not  the  rotation 
of/'. 
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Rotating  /  until  its  endpoint  q{f)  touches  y. 

Tliis  subcase  is  analogous  to  the  first  subcase  except  tliat  the  pivot  circl<-  i>  coiistnicieil  with 
a  radius  equal  to  tiie  distance  from  p  to  y(f)  instead  of  the  distance  from  p  to  p(/). 

Rotating  /  until  it  touches  the  endpoint  p(y). 

This  subcase  reduces  to  tlie  hrsi  subcase  by  rotating  y  in  tlie  oj)i>osite  direction  uniil  pijy) 
touches  /.  Clockwise  limits  become  counterclockwise  limits  and  vice  versa. 

Rotating  /  until  it  touches  the  endpoint  y(y). 

This  subcase  reduces  to  the  second  subcase  by  rotating  y  in  the  opposite  ilirection  until  y{y) 
touches  /.  Clockwise  limits  become  counterclockwise  limits  and  vice  versa. 

Rotating  a  circle  /  until  blocked  by  a  line  segment  y. 

This  case  contains  three  subcases,  all  of  which  must  be  considered.  The  tightest  limit  returned  b\ 
any  of  the  subcases  is  the  limit  returned  by  this  case. 

Rotating  /  until  it  is  tangent  to  y. 

This  subcase  is  depicted  in  figure  9.13.  Construct  two  lines  segments.  yi  and  y>.  jiarallel 
to  and  on  either  side  of  the  line  segment  y.  separated  from  y  by  a  distance  equal  to  the 
radius  of  the  circle  /.  The  endpoints  of  _</i  an<l  y-j  are  those  that  result  from  moving  the 
endpoints  of  ^  a  distance  equal  to  the  radius  of  /  along  axes  which  are  perjiendicnlar  to  y. 
The  line  segments  y\  and  y-,  are  the  potential  loci  of  the  center  of  /  if  it  were  tangent 
to  y.  Construct  a  pivot  circle  c  whose  center  is  the  pivot  point  p  and  whose  radius  is  the 
distance  from  p  to  the  center  p(/)  of  the  circle.  If  c  does  not  intersect  either  yi  or  </■.. 
then  this  subcase  doea  not  limit  the  rotation  of  F  about  the  pivot  iioint  /».  However,  if  c 
does  intersect  yi  at  a  single  point  pi  then  0(p.p{f))  —  0(p-p\)  is  a  limit  on  the  clockwise 
rotation  of  F  about  the  pivot  point  p  while  9{p.p\)  —  9{p.p(f))  is  the  corresponding  limit 
in  the  counterclockwise  direction.  If  c  int'^rsects  yi  at  Iw..  points  pi  and  p-j  then  the  larg''r 
of  9(p.p(/))  —  9(p,pi)  and  9(p.p(/))  —  9(p,p2)  is  a  limit  on  clockwise  rotation  while  the  larger 
of  9(p.  pi)—9(p.  p(f))  and  ${p,  pn)—0{p. p(/))  is  the  corresponding  limit  in  t  he  counterclockwise 
direction.  Likewise,  if  c  intersects  ^2  at  a  single  point  qx  then  0{p.p{f))  —  0{p.qi}  is  a  limit  on 
clockwise  rotation  wliile  0(p.qx  )  —  9{p,p{f))  is  the  corresponding  limit  in  the  counterclockwise 
direction.  If  c  intersects  y-j  at  two  points  qx  and  q-j  then  the  larger  of  9{p.p(f))  —  O(p.qx) 
and  9(p,p(f))  —  Oip^qn)  is  a  limit  on  clockwise  rotation  while  the  larger  of  0(p.  qx )  —  0[]>.  p(f)) 
and  9{p.qn)  —  0(p,p(f))  is  the  corresponding  limit  in  the  counterclockwise  direction.  Th<* 
position  of  /  after  the  maximal  clockwise  rotation  is  depicted  as  fx  in  figure  9,1.3  while  the 
position  of  /  after  the  maximal  couiiterclockwi.se  rotation  is  depicted  as  f-j. 

Rotating  /  until  it  touches  the  endpoint  p((/). 

This  subcctse  reduces  to  the  second  subcase  of  the  next  case  by  rotating  the  line  segment  y  in 
the  opposite  direction  until  its  endpoint  p{g)  touches  the  circle  /.  ('lockwise  limits  become 
counterclockwise  limits  and  vice  versa. 

Rotating  /  until  it  touches  the  endpoint  q(g). 

This  subcase  reduces  to  the  third  subcase  of  the  next  case  by  rotating  the  line  segment  y  in 
the  opposite  direction  until  its  endpoint  p{y)  touches  the  circle  /.  Clockwise  limits  become 
counterclockwise  limits  and  vice  versa. 

Rotating  a  line  segment  /  until  blocked  by  a  circle  y. 

This  case  contains  three  subcases,  all  of  which  must  be  considered.  The  tightest  limit  returned  by 
any  of  the  subcases  is  the  limit  returned  by  this  case. 

Rotating  /  until  it  is  tangent  to  y. 

This  subcase  reduces  to  the  first  subcase  of  the  previous  case  by  rotating  the  circle  /  in 
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the  opposite  direction  until  it  is  tangent  to  the  line  segment  c/.  Clockwise  limits  become 
counterclockwise  limits  and  vice  versa. 

Rotating  /  until  its  ondpoint  p(/)  touches  </. 

This  subcase  is  depicted  in  figure  9.J-1.  Construct  a  (>ivot  circle  r  whose  renter  is  the  pivot 
[)oint  p  and  wdiose  radius  is  the  distance  from  p  to  the  endpoint  p(/)  of  the  line  s<'gment.  If  c 
does  not  intersect  the  circle  y  then  this  sufjcase  does  not  limit  the  rotation  of  /'  about  the 
pivot  point  p.  However,  if  r  does  intersect  </  then  it  will  do  so  at  the  two  jioints.  p|  atid  pj. 
which  tnay  degenerate  to  tlie  .saint'  point.  'I'he  largt'r  of  ff(p.p(/))  —  ^(p.pi  )  and  (Hi'J'if))  — 
is  3  limit  on  the  clockwise  rotation  of  F  about  the  pivot  point  p  while  the  larger 
of  0(]>.  Pi  )—0(i>.j>{f))  and  d(p.  /s>)— fi(p. p(/))  is  the  corn-spoiuling  litnit  in  t  he  comitt'rclockwisf 
direction.  Ignoring  limits  introduretl  by  other  subcasi's,  the  position  of  /  after  the  maximal 
clockwist'  rotation  is  depicted  as  /)  in  figure  P.l  1  whih'  the  position  of  /  after  the  maximal 
counterclockwise  rotatioti  is  depicti-d  as  /•.. 
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Figure  9.15:  Rotating  a  circle  /  until  blockec*  by  another  circle  y  when  /  and  y  are  outside  each 
other. 

Rotating  /  until  its  endpoint  q(f)  touches  g. 

Tliis  .subca.se  is  analogous  to  the  second  subcase  e.'ccej)t  tliat  the  pivot  circle  is  constructed 
with  a  radius  equal  to  the  distance  from  />  to  q{f)  instead  of  the  distance  from  p  to  j>(f). 

Rotating  a  circle  /  until  blocked  by  another  circle  g. 

This  case  contains  three  disjoint  subcases.  The  applicable  subcase  can  be  determined  analytically 
by  examining  the  centers  and  radii  of  the  circles  /  and  y. 

The  circles  are  outside  each  other. 

This  subcase  is  depicted  in  figure  9. In.  In  this  subcase  the  circle  /  is  rotated  until  it  is  tangent 
to  and  outside  the  circle  g.  Construct  a  circle  gi.  concentric  with  g.  whose  radius  is  the  sum 
of  the  radii  of  /  and  g.  (’onstruct  a  pivot  circle  c  whose  center  is  the  pivot  point  p  and 
whose  radius  is  the  distance  from  p  to  the  center  p(/)  of  /.  If  c  does  not  intersect  gi  then  this 
subcase  does  not  limit  the  rotation  of  F  about  the  pivot  point  p.  However,  if  r  does  intersect  g, 
then  it  will  do  .so  at  two  points,  pi  and  />•_>.  which  may  degenerate  to  the  same  point.  The 
larger  of  0(p. p(/))  —  0(p. pi )  and  ^lp.p(/))  —  ^(p  p-.>)  if'  a  hn>i<  on  the  clockwise  rotation  of  F 
about  the  pivot  point  p  while  the  larger  of  0(p.  pi )  —  0(p.  {>(/))  and  ff(p.  /tn )  —  t?(p.  }>{/))  is  t  he 
corresponding  limit  in  the  counterclockwise  direction.  The  j)ositioii  of  /  after  th«'  maximal 
clockwi.se  rotation  is  depicted  as  /)  in  figure  9.1.7  while  the  position  of  /  after  the  maximal 
counterclockwise  rotation  is  depicted  as 

The  circle  /  is  inside  g. 

This  subcase  is  depicted  in  figure  9.1(1.  In  this  subcase  the  circle  f  is  rotated  until  it  is  tangent 
to  and  inside  the  circle  g.  ('onstruct  a  circle  (/|.  concentric  with  g.  whose  radius  is  the  radius 
of  y  minus  the  radius  of  /.  Construct  a  pivot  circle  r  whose  center  is  the  pivot  point  /<  and 
whose  ra<lius  is  the  distance  from  p  to  the  center  p(/)  of  /.  If  c  does  not  intersect  g\  then  this 
subcase  dcM’s  not  limit  the  rotation  of  F  about  the  pivot  point  ]i.  However,  if  c  does  intersect  g\ 
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Figure  9. 16:  Rotating  a  circle  /  until  blocked  by  another  circle  g  when  /  is  inside  g. 

the!)  it  will  do  so  at  two  points,  pi  and  p->-  "’hicli  may  degenerate  to  the  same  point.  The 
larger  of  0(p.p(f))  —  9{p.pi)  and  0{p,p(f))  —  0(p.p->)  is  a  limit  on  the  clockwise  rotation  of  /' 
about  the  pivot  point  p  while  the  I-  rger  of  0{p.pi )  —  0(p.p(f))  and  0{p.p-j)  —  9{p-}>{f))  is  the 
corresponding  limit  in  the  counterclockwise  direction.  The  position  of  /  after  the  maximal 
clockwise  rotation  is  depicted  as  fi  in  figure  9.16  while  the  position  of  /  after  the  maximal 
counterclockwise  rotation  is  depicted  as  /t. 

The  circle  g  is  iuside  /. 

This  subcase  reduces  to  the  second  subcase  by  rotating  g  in  the  opposite  direction  until 
blocked  by  /.  Clockwise  limits  become  counterclockwise  limits  and  vice  versa. 

9.3  Complications 

The  algorithm  presented  in  the  previous  two  sections  is  only  a  framework  for  kinematic  simulation.  It 
handles  only  the  general  cases,  not  the  complications  cau.sed  by  the  many  anomalous  special  cases  that 
arise  during  actual  ii.se  of  the  simulator  to  supj»ort  analysis  of  animated  stick  figure  movies  like  the  one 
described  in  section  6.1.  This  section  discusses  some  of  these  complications  and  how  to  deal  with  them. 
During  the  development  of  Abigail  the  process  of  discovering  that  these  anomalous  cases  existed,  and 
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then  ileterniiniiig  liow  to  correctly  deal  with  them,  was  snhstaiilially  iiiori-  difficult  au<l  took  sifiiiificaiit  1\ 
more  time  and  effort  than  implemeiiiiug  the  general  case.  One  may  ask  whether  it  is  necessary  to  handle 
all  of  these  special  cases.  .Many  of  tln^se  special  cases  were  <liscovered  because  the  e\enl  perception 
mechanism  built  on  top  of  the  imagination  capacity  wonhl  jirodnce  the  wrong  results  due  to  incorri-ct 
handling  of  these  anomalous  cases,  for  example,  prior  to  dealing  with  roundoff  errors,  ohjei  ts  would 
mysteriously  and  unpredictahly  fall  through  the  floor  for  rea.sons  which  will  he  iliscussed  in  section  It.d.  I 


9.3.1  Clusters 


As  described  in  section  9.1.  at  each  step  during  imagination,  the  kinematic  simulator  will  attempt  to 
translate  o.-  rotate  a  single  set  of  figures,  the  fori'groimd.  leaving  the  remaining  figures,  the  hackgroiind. 
stationary.  Foregrounds  were  chosen  as  connected  coni|)oneiiis  in  the  connection  graph  of  the  image, 
i.e.  .sets  of  figures  connected  by  joints.  Figure  9.17(a)  dejiicts  problems  that  arise'  with  this  simple  choice 
of  foregrounds.  The  figure  shows  two  Interlocking  yet  distinct  objects.  .1  aiul  B.  Since  they  are  not  joined 
together  they  constitute  separate  coimected  components  and  will  be  cotisidered  as  separate  foregrounds 
for  translation  and  rotation.  However,  when  attempting  to  translate  .1  downward  alone.  B  blocks  any 
downward  motion  of  Likewise,  when  attempting  to  translate  B  downward  alone.  .1  blocks  any 
downward  motion  of  B.  Thus  neither  A  nor  B  will  fall  when  simulated.  They  wi9  ’•"main  suspended  in 
mid-air.  This  same  situation  happens  not  only  for  the  ca.se  of  falling:  it  can  hapjien  for  all  of  the  tyjies 
of  movement  considered  in  section  9.1.  This  includes  falling  down,  falling  over,  sliding  along  a  linear 
or  circular  surface,  attd  varying  a  joitit's  flexible  rotation  and  translational  or  rotational  displaci'inent 
parameters.  Figure  9.17(b)  depicts  two  objects  jointly  sliding  down  an  inclined  jilane.  Figure  9.17(c) 
depicts  two  objects  jointly  falling  over.  Figure  9.17(d)  shows  how  the  problem  can  arise  when  \arying 
the  flexible  rotation  parameter  of  a  joitit  which  would  jointly  pivot  two  interlocking  objects  about  that 
joint.  It  occurs  even  without  interlocking  objects.  The  heavy  ball  in  figure  9.17(e)  will  not  (uish  the 
see-saw  clown  sitice  the  ball  atid  see-saw  are  distinct  connected  components  and  thus  they  will  not  rotate 
together  around  the  pivot.  The  sec>-saw  prevents  downward  movement  of  the  ball,  ’t’ei  the  .see-saw  alone 
will  not  rotate  since  rotating  it  alone  will  increa.s»'  its  |>otential  em'rgy. 

The  solution  to  this  problem  is  conceptually  simple.  Treat  .d  and  B  together  as  a  single  foreground 
called  a  cluster.  More  generally,  the  solution  can  be  stated  as  follows.  Form  all  connected  compo¬ 
nents  /j,....F„  in  the  connection  graph  of  the  image.  Two  connectt'd  rom|)onents  are  said  to  touch 
if  some  figure  from  one  touches  scjine  figure  from  the  otfier.  ('consider  as  a  foreground,  all  clusters  /' 
that  are  union  .sets  of  a  collection  of  connected  comj>onents.  i.e.  U  •  ■  ■  U  f  ,,,, .  where  the  collection  of 
connected  compotients  is  itself  connected  by  the  component  touching  relation.  When  varying  a  flexible 
l)arameter  of  a  joint  j.  only  clusters  which  do  not  contain  both  f{j)  and  (j{j)  are  considered. 

The  above  solution  has  a  drawback,  however.  It  becomes  intractable  when  there  is  a  large  set  of 
connected  comfioiienis  that  are  connected  by  the  touching  n'latioii  since  every  sub.set  of  that  set  which 
is  still  connected  by  the  touching  relation  must  be  cottsidered  as  a  chistt'r.  This  situation  does  arise  in 
|)ractice  in  at  least  one  case.  AbU;aII.  begins  watching  a  movie  with  an  empty  joint  model.  Objects 
containing  many  figures  which  will  later  be  treated  as  a  single  connected  com|>onent  due  to  joints  not 
yet  hypothesized  will  initially  be  freal<'d  as  clusters.  While  two  joiin'd  figures  will  always  be  considered 
as  part  of  the  same  foreground,  two  touching  but  unjoined  figures  are  only  optionally  considered  as  |)art 
of  the  same  cluster.  Such  tiondetertninism  iti  the  choice  of  clu.stere<l  foregrounds  with  an  emiity  joint 
tnodel  leads  to  intractability  in  the  kinematic  simulator.  Fhis  intractability  is  eliminated  in  tin-  current 
im[>lementalion  by  forming  clusters  only  once  an  initial  joint  model  has  been  formulated. 
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Figure  9.17:  These  situations  require  cluster  movement.  When  attempting  to  move  either  object  .-1 
or  B  alone,  one  will  block  anv  motion  of  the  other  yielding  anomalous  simulation  results  where 
objects  ^  and  B  remain  suspended  but  unsupported.  The  .solution  is  to  treat  .A  and  B  as  a  single 
clustered  foreground  and  attempt  to  translate  or  rotate  them  together. 
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9.3.2  Tangential  Movement 

Section  9.2  presenttHi  analytic  inetliotls  for  calculating  llie  maximal  amount  that  one  figure  can  translate 
or  rotate  until  blocketl  by  another  figure.  Tin*  methods  presentetl  dealt  t)nl>  with  the  non-degenerate 
cases.  Some  of  the  computations  required  finding  the  intersection  between  a  translation  ray  and  a  line 
segment.  What  hairpens  if  the  ray  is  coincident  with  the  line  segment':’  in  this  case,  they  intersect  at 
infinitely  many  points.  This  degenerate  case  can  arise  when  one  line  segment  slides  along  another.  Other 
computations  require  finding  the  intersection  between  a  pivot  circle  and  another  circle.  What  hap|)ens  if 
the  two  circles  are  concentric  and  equiradial'.'  In  this  case  again,  they  intersect  at  infinitely  many  ]ioints 
This  degenerate  case  can  arise  when  pivoting  a  line  segment  that  lies  inside  a  circle  about  the  cent<'r  of 
the  circle,  so  that  its  endjioint  slides  along  the  interior  wall  of  the  circle. 

In  general,  all  such  degenerate  ca.ses  involve  movement  tangent  to  some  surface,  fhough  the  above 
cases  of  tangential  inovetnent  resulted  in  degenerate  computation  of  intersection  [loints.  tangential  mov<‘- 
ment  need  not  jtroduce  such  degeneracies.  One  example  of  such  a  situation  would  b«'  th<‘  translation 
of  a  line  segment  until  its  endpoint  wa.s  blocked  by  a  circle.  If  the  translation  ray  is  tangent  to  the 
circle,  it  intersects  the  circle  at  one  point  instead  of  two.  Sometimes,  a  surface  that  is  tangent  to  the 
direction  of  motion  does  not  block  motion  of  the  foreground.  The  first  two  examples  are  illustrations 
of  such  situations.  In  other  situations,  a  surface  that  is  tangent  to  the  direction  of  motion  can  block 
motion  of  the  foreground.  The  third  example  depicts  such  a  situation.  Each  of  the  eight  cases  di.scu.s.sed 
in  section  9.2,  and  all  of  their  subca.ses.  must  be  analyzed  in  detail  to  determint'  when  the  background 
blocks  tangential  movement  of  the  foreground,  and  when  it  does  not.  Detailed  analysis  of  each  of  these 
cases  has  demonstrated  that  in  all  cases  where  /  does  not  touch  y.  if  g  would  limit  tangential  move¬ 
ment  /  then  that  movement  would  be  even  further  limited  by  some  other  non-tangent ial  case.  Thus  the 
limits  introduced  by  tangential  movement  can  be  ignored  when  /  does  not  touch  g.  When  /  touches  g. 
however,  g  may  or  may  not  totally  limit  any  tangential  movement  of  /  depending  on  tin-  situation. 
This  analysis  for  each  of  the  ten  irreducible  subcases  is  summarized  below  and  depicteil  in  figures  9.1f< 
and  9.19. 

Translating  a  line  segment  /  until  its  endpoint  p(f)  touches  another  Ihie  segment  g. 

Tangential  movement  arises  in  this  subcase  when  the  translation  ray  r  is  coincident  with  g.  .\  line 
segment  g  never  limits  tangential  movement  of  another  line  s<'gment  /.  This  case  is  depicted  in 
figure  9.18(a). 

Translating  a  circle  /  until  it  is  tangent  to  a  line  segment  g. 

Tangential  movement  arises  in  this  subcase  when  the  translation  ray  r  is  coincident  with  either  i/i 
or  g-2.  This  subcase  never  limits  tangential  movement.  This  subcase  is  depicted  in  figure  9.18(b), 

Translating  a  line  segment  /  until  its  endpoint  p(f)  touches  a  circle  g. 

Tangential  movement  arises  in  this  subcase  when  the  translation  ray  r  is  tangent  to  circle  g.  This 
subcase  limits  tangential  movement  only  when  /  is  inside  g.  This  is  the  case  only  when  \0{f)  — 
^{pi^p(y))\  <  ^  when  normalized.  The  subcase  where  g  blocks  /  is  depicted  in  figure  9.18(e).  while 
the  subcase  where  g  does  not  block  /  is  depicted  in  figure  9.18(c). 

Translating  a  circle  /  until  blocked  by  a  circle  g  when  /  and  g  are  outside  each  other. 

Tangential  movement  arises  in  this  subcase  when  the  translation  ray  r  is  tangent  to  gi.  This 
subcase  never  blocks  tangential  movement.  This  subcase  is  depicted  in  figure  9.18(d). 

Translating  a  circle  /  until  blocked  by  another  circle  g  when  /  is  inside  g. 

Tangential  movement  arises  in  this  subcase  when  the  translation  ray  r  is  tangent  to  gi.  This 
sul)ca.se  always  blocks  tangential  movement.  This  subcase  is  depicted  in  figure  9.18(f). 
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Figure  9.19:  An  analysis  of  all  cases  where  the  rotation  of  the  foreground  figure  /  is  tangential  to 
the  background  figure  g.  In  cases  (b),  (d).  (e).  (i),  (j).  (k).  (ni),  and  (o)  g  blocks  movement  of  / 
while  in  the  remaining  cases  it  does  not. 
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Rotating  a  line  segment  /  until  its  endpoint  />(/)  touches  another  line  segment  y. 

'raiigeutial  luoveiueat  arises  in  this  subcase  when  the  pivot  circle  c  is  tangent  to  y.  Tins  sulicase 
limits  tangential  movement  only  win..  1^(/)  —  W(y.(/).p)(  >  i  when  normalized.  A  boundary  case 
arises  when  |^(/)  —  This  boundary  case  will  be  discussed  in  section  I  he 

subcase  where  y  blocks  /  is  depicted  in  figure  ll.lt((b).  while  the  subcase  where  y  dot's  not  block  / 
is  depicted  in  figure  9,iy(a). 

Rotating  a  circle  /  until  it  is  tangent  to  a  luie  segment  y. 

Tajigential  movement  arises  in  this  subcase  when  the  pivot  circle  c  is  tangent  to  either  i/i  or 
This  subcase  limits  tangential  movement  only  when  p  aiul  ;>(/)  art'  on  ojiposite  sitles  of  y  or 
when  p  is  closer  to  y  than  p(f).  These  two  subcases  where  y  blocks  /  are  th'iiictetl  in  figure  y.l;((d) 
aiul  9.iy(e)  respectively,  while  the  subcase  where  y  does  not  block  /  is  depicted  in  figure  9.19(c). 

Rotating  a  line  segment  /  until  its  endpoint  p{f)  touches  a  circle  y. 

Tangential  movement  arises  in  this  subcase  when  the  pivot  circle  c  is  tangent  to  y.  There  are  tliret* 
subcases  to  consider. 

c  is  inside  y. 

This  subcase  limits  tangential  movement  only  when  /  is  outsiile  </.  This  is  the  case  only 
when  \d(p{f),p(y))  —  0{f)\  >  ~  when  normalized.  The  subcase  where  y  blocks  /  is  dejiicted 
in  figure  9.l9(k).  while  the  subcase  where  y  does  not  block  /  is  depicted  in  liguri'  9.19(h). 

y  is  inside  c. 

This  subcase  limits  tangential  movement  only  when  /  is  inside  y.  This  is  the  case  only 
when  \0{pU)-P(y))  —  ^(/)|  <  ^  when  normalized.  The  subca.se  where  y  blocks  /  is  depicted 
in  figure  9.19(j).  while  the  subcase  where  y  does  not  block  /  is  depicted  in  figure  9,iy(g). 

y  and  c  are  outside  each  other. 

This  subcase  limits  tangential  movement  only  when  /  is  inside  y.  This  is  the  ca.se  only 
when  \0(p{f).p{y))  —  0if)\  <  ^  when  normalized.  The  subca.se  where  y  blocks  /  is  depicted 
in  figure  9.19(i).  while  the  subca.se  where  y  does  not  block  /  is  dej>icled  in  figure  9.19(f). 

Rotating  a  circle  /  until  blocked  by  a  circle  y  when  /  and  y  arc*  outside  each  other. 

Tangential  movement  arises  in  this  subcase  when  the  pivot  circle  r  is  tangent  to  y,.  This  subcase 
limits  tangential  movement  only  when  r  is  inside  </).  This  is  the  case  only  when  A(p{y).p)  < 
■^(p(y)-y(y))  +  MpiD-yil))-  subcase  where  y  blocks  /  is  depicted  in  figure  y.I9(m).  while 

the  subcase  where  y  does  not  block  /  is  depicted  in  figure  9.19(1). 

Rotating  a  circle  /  until  blocked  by  another  circle  y  when  /  is  inside  y. 

Tangential  movement  ari-ses  in  this  subca.se  when  the  pivot  circle  e  is  tangent  to  yj .  This  sulicase 
limits  tangential  movement  only  when  r  is  outside  yi.  This  is  the  case  only  when  A{j){y).p)  > 
■^{piy)'  y{y))  —  •^{piD-  y(f))-  The  subcase  where  y  blocks  /  is  depicted  in  figure  9.19(o).  while  the 
subcase  where  y  does  not  block  /  is  depicted  in  figure  9.19(n). 

9.3.3  Touching  Barriers 

Section  9.2  presented  analytic  methods  for  calculating  the  ma.ximal  translation  or  rotation  of  one  figure 
until  blocked  by  another  figure.  The  methods  presented  dealt  only  with  the  non-degenerate  ca.se  of 
movement  by  some  nonzero  c.  When  however,  a  figure  /  to  be  moved  touches  a  figure  y.  y  may  prevent 
any  movement  of  /  along  a  given  a.xis  or  in  a  given  direction  about  a  given  pivot.  In  such  cases,  the 
analytic  methods  from  section  9.2  will  yield  c  =  0.  If  movement  of  /  is  indeed  blocked  by  y  then  this 
is  the  correct  solution.  But  there  are  cases  where  the  analytic  methods  incorrectly  yield  c  =  0  even 
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Figure  9. JO:  The  analytic  methods  for  determining  maximum  translation  and  rotation  yield  t  =  0 
when  two  figures  touch.  Sometimes  movement  is  indeed  blocked  in  this  situation,  as  in  la),  while 
other  times,  movement  is  not  blocked,  as  in  |b). 


tlioiigli  /  is  not  blocked  by  g.  This  happeiis  when  /  fotiches  g  but  is  on  the  other  side  of  //  given  its 
direction  of  inoveinent.  Figure  9.20  shows  how  this  can  arise  when  translating  one  line  segment  relative 
to  another  line  segtnent.  In  figure  9.20(a).  g  blocks  translation  of  /  while  in  figure  9.20(b).  g  does  not 
block  translation  of  /.  Analogous  situations  occur  when  translating  or  rotating  any  figure  typi>  relative 
to  any  other  figure  type. 

To  deal  with  this  problenv.  the  analytic  methods  must  be  augmented  to  dt“l ermine  whether  g  is  or  is 
not  a  barrier  to  the  movement  of  /  when  they  would  otherwi.se  yi('ld  r  =  0.  .\ll  of  the  ca,ses  and  subca.ses 
can  be  handled  by  the  same  general  technique  which  operates  as  follows.  The  maximal  movement  <  will 
be  limited  to  zero  only  wheti  /  and  g  touch.  Denote  their  point  of  contact  by  g.  Form  a  line  /  through  g 
as  follows.  If  </  is  a  line  segment  then  it  is  extended  to  form  /.  If  //  is  a  circh*  then  /  is  the  line  tangent  to  g 
at  point  q.  This  barrier  line  divides  the  plane  into  two  half-planes.  The  figure  /  will  lie  in  at  most  one  of 
these  half-planes.  Let  o  be  the  direction  of  the  movement  of  /.  A  ray  [irojected  from  g  in  the  direction  o 
will  also  lie  in  at  most  one  half-plane.  The  figure  g  blocks  the  movement  of  /  only  when  /  does  not  lie  in 
the  same  half-plane  as  that  ray.  Applying  this  technique  to  each  case  and  subca.se  requires  determining 
both  the  half-plane  in  which  /  lies  as  well  as  the  direction  of  movement  o.  The  former  depends  on  the 
shape  of  /.  If  /  is  a  line  segment,  one  endpoint  lies  on  /  at  the  |)oint  of  contact  </.  Tlu'  other  endpoint 
occupies  the  same  half-plane  as  all  of  /.  If  /  is  a  circle,  its  center  />(/)  occupies  the  same  half-plane  as  all 
of  /.  Thus  determining  the  half-plane  occupied  by  a  figure  /  can  lie  determined  by  examining  a  single 
point  which  I  will  denote  cis  q' .  When  translating  /  along  an  axis  9.  the  direction  of  movement  o  is  the 
same  as  6.  When  rotating  /  about  a  pivot  point  p.  the  direction  of  movement  is  given  by  the  direction 
of  a  ray  projected  from  the  contact  point  q  tangent  to  a  circle  c  whose  center  is  p  and  w  hose  radius  is 
the  distance  from  p  to  g.  For  clockwise  rotation  this  is  0{p.g)  —  f  while  for  counterclockwise  rotation 
this  is  9{p,  q)+ 

Given  a  barrier  line  1.  a  direction  of  movement  o.  and  a  point  q'.  g  blocks  the  movement  of  /  only 
when  a  ray  projected  from  q' .  along  the  axis  o.  intersects  I.  When  applying  this  check  to  each  of  the  cases 
and  subcases  one  must  remember  that  some  of  the  cases  determine  whether  g  blocks  the  movement  of  / 
by  determining  wdiether  /  blocks  the  movement  of  //  is  the  opposite  direction.  Each  casi'  and  suhca.se 
must  take  this  into  account  when  computing  the  parameters  /.  o.  ami  q'  for  this  check  procinlure. 

The  above  check  whether  g  blocks  the  movement  of  /  can  be  viewed  as  a  boundary  case  of  t  he  mor<' 
general  case  of  movement  discussed  in  section  9.2.  This  boundary  case  itself  has  two  boundary  cases. 


<  HA  FTKli  •).  SA I  \  h  PH  \  SI(  S 


18!S 

One  occurs  when  the  direction  of  nioveinent  o  is  parallel  to  the  harrier  line  /  In  this  case,  neitinr 
half-plane  is  in  front  of  or  helmnl  the  figurt'  /,  This  case  is  covered  l>y  the  tangential  tnovenieiit  cases 
discussed  is  section  9.d.‘2.  d'he  other  occurs  when  i/'  lies  on  the  harrier  line  /  in  this  <  a.s,  /  iloes  not  he  in 
either  half-plane.  An  aiiihiguity  arises  as  to  which  snle  off  figure  /  lies  on.  1  Ins  can  laily  lia()|)en  when  / 
is  a  line  segineni.  When  y  is  a  circle,  f  can  only  move  in  a  direction  that  will  kee|i  it  outside  ;/  .\nalytii 
methods  similar  to  those  discussed  above  can  determine  the  alloweil  direction  of  movement  W  hen  is 
a  line  segment,  howeser,  a  genuine  amhiguity  arises.  This  can  only  haiipeii  when  /  is  coincitleiit  with  </ 
as  is  depicted  in  figure  9.21(a).  In  this  case  it  is  genuinely  amhiguous  as  to  which  side  of  ,j  the  hftiire  / 
lies  on.  This  situation  therefore  admits  otily  two  consistent  interpretations.  Kiilier  //  blocks  or  doesn't 
block  /  uniformly  for  any  type  of  movement.  .Adopting  the  latter  interpretation  would  lead  to  problems 
since  objects  then  could  fall  through  the  floor.  Adopting  the  former  interpretation,  howfver.  leads  t<i 
the  anomalous  situation  depicted  iti  figure  9.21(b)  where  John  falls  on  his  knee,  but  doesn't  fall  any 
further,  sitice  his  calf,  being  coincident  to  the  ground,  cannot  rotate  or  tratislate.  .\Bl(i.Aii.  adopts  tin- 
latter  alternative,  thus  exhibiting  this  anomaly.  A  solution  to  this  problem  wotild  recpiire  modih  ing  tin- 
procedures  described  in  section  9. .'1.2  to  examine  tin-  context  of  two  figures,  i.e.  otln-r  ligtires  connected 
to  either  the  foreground  /  or  the  backgrouinl  </.  when  determining  whether  i/  blocks  movement  of  /. 


9.3.4  Tolerance 

All  of  the  procedures  described  in  sectiotis  i).2.  9.3.2.  and  9.3.3  must  be  modifieil  to  deal  with  roundoff 
error.  RoundofT error  cati  introdtice  gross  stibstantiality  violations  in  the  resulting  simulation  as  depicted 
in  figure  9.22.  Figure  9.22(a)  depicts  a  line  .segment  /  falling  toward  a  line  segment  //.  If  the  litiiit 
calculation  has  roundoff  error,  it  cati  produce  a  situation,  depicted  in  figure  9.22(b).  where  /  is  tratislated 
slightly  too  far.  In  the  tiext  step  of  the  sitiiulation.  however,  the  endjioint  of  /  is  now  past  i/  atid  tlitis  a 
translatioti  ray  projected  frotn  that  etidpoint  will  not  int<'rs«-ct  <j.  I'liiis  ;/  limits  tin-  translation  of  /  only 
to  the  position  itidicated  in  figure  9.22(c).  At  this  point.  /  can  fall  away  from  as  in  figure  9.22(d),  sine.- 
in  figure  9.22(c).  g  does  not  block  /  in  its  direction  of  movement.  Ihus  due  to  slight  roiindotf  error  in  the 
tran.sition  from  figure  9.22(a)  to  figure  9.22(b),  /  is  able  to  pa.ss  through  g.  .-\s  figure  9.22  shows,  roundoff 
error  can  introduce  gross  deviatiotis  from  the  desired  simulation,  not  just  minor  dilfereiici-s.  .Accordingly. 
-ABlCiAll.  incorjiorates  a  notion  of  toleratice  whenever  <letermitiing  whethi-r  two  ligtires  touch,  so  that 
figure  9.22(b)  is  interpreted  as  an  instance  of  touching  barriers  to  be  handled  via  the  methods  describt-d 
iti  section  9.3.3.  Furthermore,  the  methods  described  in  .section  9.2  must  be  moditied  in  this  case  to 
return  <  =  0  even  though  the  translation  ray  does  not  inters<>ct  g. 


9.4  Limitations 

The  kinematic  simulator  just  pre.sented  suffers  from  a  .severe  limitation.  It  cati  only  collectively  translate 
or  rotate  one  group  of  figures  at  a  time.  Such  collective  movement  can  correctly  simulate  either  rigid 
body  motion,  or  the  motion  of  a  non-rigid  tnechatiism  where  only  a  single  joint  jiarameter  changes.  It 
is  not  able  to  correctly  simulate  the  behavior  of  mechanisms  which  require  that  different  collect iotis  of 
figures  simultaneously  move  along  different  paths.  Several  such  mechanisms  are  showti  in  figiin-s  9.2.'l 
and  9. 2d. 

The  mechanism  in  figure  9.23  contains  two  line  segments  /i  and  Jn.  fastened  at  the  endpoitits  p(/| ) 
and  pifn)  by  a  joint  j  with  flexible  rotation  and  rigid  displacement  parameters.  The  endiioitits  q(f\) 
and  ("/(/o)  are  supported  on  the  ground.  Since  the  micro-world  ontology  lacks  any  tiotioti  of  frictioti. 
the  endpoints  (/(/i)  and  c/I/t)  should  slide  apart  along  the  ground  while  the  flexible  joint  rotation  0{j) 
increases  until  both  f\  and  f‘>  lie  flat  on  the  ground.  .Abigaii..  however,  is  not  able  to  [iredicl  this  motion 
since  it  requires  simultaneously  rotating  the  line  .segments  /j  and  J->  in  o|)posit('  directions,  as  well  as 


Figure  9.21:  An  ambiguous  situation  occurs  when  the  foreground  /  and  background  g  are  two  coin¬ 
cident  line  segments.  In  this  situation  it  is  not  possible  to  determine  on  which  .side  of  the  background 
the  foreground  lies.  Becau.se  of  this  ambiguity.  Abigail  will  neither  translate  nor  rotate  /  relative 
to  g  for  fear  of  violating  substantiality  as  depicted  in  (a).  A  case  where  this  arises  in  practice  is  de¬ 
picted  in  (b).  Once  John  falls  on  his  knee  he  will  not  fall  any  further,  since  his  calf,  being  coincident 
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Fifiure  9.22;  Roundoff  errors  during  simulation  can  cause  snbstantialit.v  violalioii>  and  result  in  gross 
deviations  from  the  desired  simulation.  Here  an  object  /  falls  toward  an  object  y.  Ordinarily  y  should 
block  liie  fall  of  /.  Roundoff  error  during  step  (b).  however,  causes  a  substantiality  violation  Since 
the  endpoint  of  /  is  now  past  y.  a  translation  ray  projected  from  that  endpoint  will  not  intersect  y 
and  thus  g  will  limit  the  ntovement  of  /  only  until  the  position  indicated  in  (c).  Since  in  (c).  y  does 
not  block  /  in  its  direction  of  movement,  /  can  fall  from  y  as  in  (d).  Thus  due  to  the  roundoff  error 
in  (b),  /  falls  through  y. 


j 


Figure  9.23:  A  mechanism  whose  behavior  A bigail  cannot  predict.  This  mechanism  has  two  line 
segments  /j  and  /2,  and  a  single  joint  j,  where  /(/)  =  /i .  y{j)  =  S2.  0(j)  is  flexible.  ^](j)  = 
0  and  tig(j)  =  0.  The  endpoints  q(f\)  and  <?(/.))  should  slide  along  the  ground  while  9(j)  increases 
until  /i  and  fi  lie  flat  on  the  ground.  A  bigail  is  not  able  to  predict  such  motion  since  it  requires 
the  simultaneous  rotation  and  translation  of  /i  and  along  different  paths. 
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Figure  9..J4;  A  four  bar  linkage.  I  sing  the  terminologv  of  this  thesis,  this  linkage  can  be  described 

as  four  line  .segments  Jo . /i  and  four  joints  jo . J-.  where  for  i  =  0 . f[ji)  =  h  -  yiji)  = 

/i  +  iinodr.  =  9.  ^g{ji)  —  1.  and  H{j,)  is  flexible.  A  bigail  cannot  predict  the  behavior  of  such 

linkages  since  changing  the  rotation  parameter  of  any  joint  woidd  require  the  simultaneous  motion 
of  at  least  three  line  segments  along  different  paths. 

translating  them  collectively  downward,  in  order  to  decrease  the  potential  energy  of  tin'  iin’chanisiii. 
Any  one  of  these  movements  alone  will  increase  the  potential  energy  so  no  movement  will  be  attemjtted. 

The  mechanism  in  figure  9.24  is  a  classic  four  bar  linkage.  It  contains  four  line  segments  /o . /n 

joined  at  their  endpoints  by  four  joints  jo. . . . ,  jj  with  flexibi*'  rotation  and  rigid  displacement  parameters. 
Assuming  that  one  of  the  line  segments  has  a  fixed  position  and  orientation,  changing  the  rotation 
parameter  of  any  otie  of  the  joints  will  cause  all  of  the  joint  rotation  parameters  to  change  and  the 
remaining  line  segments  to  translate  and  rotate  along  different  paths. 

Both  of  these  mechanisms  share  a  common  property.  They  have  a  cycle  in  tlx'ir  coniu'ction  grai)h.  ‘ 
The  cycle  in  figure  9.24  is  apparent.  The  cycle  in  figure  9. 2d  results  from  the  fact  that  due  to  the 
ground  plane  constraint,  the  mechanism  behaves  as  if  the  ground  wa.s  a  line  segment  ij  and  figures  /) 
and  fg  were  joined  to  jt  by  joints  with  flexible  rotations,  rigid  displacements  along  /j  and  and  flexible 
displacements  along  y. 

Abigail  can  only  accurately  predict  the  behavior  of  mechanisms  whose  connection  graphs  do  not 
contain  cycles.*'’  This  includes  both  explicit  cycles  due  to  joints  as  well  as  implicit  cycles  due  to  the 
ground  plane  and  substantiality  constraints.  This  means  that  the  kinematic  simulator  used  to  impleiiK'nt 
Abigail's  imagination  capacity  is  not  cognitively  plausible  since  people  can  understand  the  behavior 
of  such  mechanisms.  While  a  person  might  not  be  able  to  accurately  calculate  the  exact  quantitative 
relationship  between  the  motion  of  parts  A  and  B  in  mechanism  shown  in  figure  9.25,  sh<'  nonetheless 
could  at  least  predict  that  pushing  .4  will  cause  B  to  move  and  pf'rhaps  even  predict  the  direction  of 
motion. 


9.5  Experimental  Evidence 

Spelke  (1988)  reports  a  number  of  experiments  that  illuminate  the  nature  of  infant  visual  pt'rce|)tion. 
Most  of  these  experiments  use  the  paradigms  of  habituation/dishabituation  and  preferc'iit  iai  looking 

®The  ronnection  graph  of  a  rnechanism  is  a  graph  where  the  figures  eonstiliite  the  verti<  ps  anri  there  is  an  undirect e<l 
edge  between  two  vertices  if  their  corresponding  figures  are  joined. 

*®She  can  still  watch  movies  that  depict  such  mechanisms  with<»ut  breaking.  She  will  just  treat  a  cyclic  iiierhaiiisin  as  a 
rigid  body. 


Figure  9.25:  Abigail's  imagination  capacity  is  impoverished  with  respect  to  human  imagination 
capacity.  While  humans  can  predict  that  pushing  A  will  cause  B  to  move.  Abigail  cannot  make 
such  a  prediction  since  the  connection  graph  of  this  mechanism  contains  cycles  and  the  kinematic 
simulator  used  to  implement  Abigail's  imagination  capacity  cannot  handle  cycles. 


y.o.  EXPERIAIKSTAL  EMDEXCE 


I9;i 


as  wiiuiows  on  infant  perivpiion.  A  j>,<Mieral  proixTiy  of  tin-  nervons  systt-ni  is  that  it  ha/ntnatos  to 
repeated  stimuli.  Tin'  level  of  response  elicited  from  repeated  api)licalions  of  similar  stimnli  decreases 
when  compared  with  the  initial  application  of  the  stimulus.  After  hahitnation  however,  application  of 
a  novel  stimulus  will  again  ('licit  a  greater  lev<'l  ofrespon.se.  Sinct'  this  (lishnhil nut iuii  happens  onl>  for 
novel  stimuli  it  can  he  used  as  a  probe  to  di'termini'  whether  two  stimuli  are  characterized  as  similar 
or  different.  'I'he  e.xperimental  framework  is  as  follows.  Subjects  are  first  habituated  to  stimulus  .1  and 
then  expost'll  to  stimulus  li.  Alternatively,  they  are  habituated  to  .1  and  then  exposed  to  (  \  greater 

level  of  dishabituation  for  ('  than  for  B  is  taken  as  evidence  that  B  is  classified  as  more  similar  to  .1 
than  C  is.  In  the  case  of  infants,  the  rt'sponst*  It'vel  is  often  measured  by  preferential  looking,  measuring 
the  amount  of  time  they  look  at  a  presented  stimulus,  or  at  out'  stimulus  versus  another. 

Sjielke  reports  t  wo  experiments  which  give  evidence  that  by  age  five  months,  children  are  aware  of  t  he 
sidistantiality  constraint.  The  first  exi>eriment  was  originally  reported  by  Baillargeon  ei  al.  (iKf'’)).  This 
experiment  is  illustrated  in  figure  y.2().  Infants  were  habituated  to  a  scenario  ih'picting  a  scret'ii.  Initially 
the  screen  lay  flat  on  its  front.  Subsequently,  it  lifted  upwards  and  rotated  backwards  until  it  lay  flat  on 
its  back.  Finally,  its  motion  was  reversed  until  it  again  lay  flat  on  its  front.  To  make  this  motion  ch'ar. 
both  front  and  side  views  are  depicted  in  figure  (l.'itifa).  though  the  actual  stimulus  in  the  t'xiterimenl 
contained  only  the  front  view.  The  two  dishahituatiou  stimuli  are  shown  in  figures  y.2l)(h)  and  9.‘2()(c). 
In  both,  a  block  is  situated  behind  the  screen  such  that  it  is  occluded  as  the  scrt't'ii  is  raised.  I'he  first 
depicts  a  possible  event:  the  screen  only  rotates  as  far  back  as  it  can  without  penetrating  the  occluded 
block.  The  second  depicts  an  impossible  event:  the  screen  continues  to  rotate  lt<0‘’.  I'idess  tin'  block 
disappears,  this  would  constitute  a  substantiality  violation,  five-month-old  infants  dishabituate  mort'  to 
the  latter  scenario  than  the  former.  This  is  interpreted  as  evidence  that  they  interjiret  both  scenarios  (a) 
and  (b)  as  normal  but  scenario  (c)  as  abnormal.  Baillargeon  (1987)  rejiorts  continued  experiments  along 
the.se  lines  wdtich  show  that  children  are  attentive  to  substantiality  violations  by  agt'  Ir-months  and 
perhaps  even  by  age  d^j-months.  Baillargeon  (198())  reports  additional  experiments  which  show  that 
children  take  the  location  of  hidden  objects  into  account  in  their  desire  to  uphold  the  substantiality 
constraint. 

Spelke  reports  a  similar  experiment  performed  jointly  with  Macomber  and  Keil  on  four-month-old 
infants.  This  experiment  is  depicted  in  figure  9.27.  Here,  the  infants  wert'  habituated  to  tin'  following 
scenario.  An  object  was  dropped  behind  a  screen.  The  screen  was  then  lifted  to  reveal  the  object  lying 
on  the  ground  as  shown  in  figure  9.27(a).  The  two  dishabituation  stimuli  are  shown  in  figure  9.27(b) 
and  9.27(c).  In  both,  a  table  appears  in  the  path  of  the  falling  object  when  the  screen  is  reniovt'd. 
The  first  depicts  the  object  lying  on  the  table — a  different  position  than  in  the  habituation  scenario. 
The  second  depicts  the  object  lying  underneath  the  table — in  the  sam<'  position  as  in  the  habituation 
scenario — yet  one  which  cannot  be  reached  without  a  substantiality  violation.  Four-month-old  infants 
dishabituate  more  to  the  latter  scenario  than  the  former,  again  giving  evidence  that  they  are  cognizant 
of  the  substantiality  constraint  by  age  four  months. 

Spelke  reports  that  Macomber  performed  a  variation  of  the  previous  experiment  in  attt'inpt  to  de¬ 
termine  the  age  at  which  infants  know  about  gravity.  This  variation  is  depicted  in  figure  9.28.  Infants 
were  habituated  to  an  object  falling  behind  a  screen  with  the  screen  being  removed  to  reveal  the  object 
lying  on  a  table.  In  both  dishabituation  stimuli,  the  table  top  was  removed.  In  the  first  dishabituation 
stimulus,  removing  the  screen  revealed  the  object  at  rest  on  the  floor,  beneath  its  original  iiosition  on 
the  table  top.  w’hile  in  the  second,  removing  the  screen  revealed  the  object  at  the  same  position  as  it 
was  in  the  habituation  scenario.  This  time  however,  the  object  was  suspended  unsupported  in  mid-air 
due  to  the  disappearance  of  the  table  top.  Spelke  reports  that  four-month-old  infants  dishabituate  more 
to  the  former  scenario  than  the  latter,  implying  that  they  do  not  yet  form  correct  judgments  ba.sed  on 
gravity  and  support. 

At  some  point  however,  children  do  come  to  posse.ss  knowledge  of  gravity  and  siipjiort.  The  only 
question  is  at  what  point  they  do  so.  1  conjecture  that  such  development  happens  early.  If  the  analysis 


194 


C  HA  PTEH  y  .V.4 1 \  E  PH  'i  SICS 


Figure  9.26:  Displays  for  an  experiment  demonstrating  infant  knowledge  of  substanticdity.  (Fig¬ 
ure  7.7  from  Spelke  (1988).)  Infants  habituated  to  sequence  (a)  dishabituate  more  to  sequence  (c) 
than  to  sequence  (b).  Since  sequence  (c)  depicts  a  substantiality  violation,  this  is  interpreted  as 
evidence  that  five-month-old  children  have  knowledge  of  substantiality. 


9.5.  EXPERIMENTAL  EVIDENCE 
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Figure  9.27:  Displays  for  an  experiment  demonstrating  infant  knowledge  of  substantiality.  (Fig¬ 
ure  7.8  from  Spelke  (1988).)  Infants  habituated  to  seqiienre  (a)  dishabituate  more  to  sequence  (c) 
than  to  sequence  (b).  Since  sequence  (c)  depicts  a  substantiality  violation  this  is  interpreted  a.s 
evidence  that  four-month-old  children  have  knowledge  of  substantiality. 
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Figure  9.28;  Displays  for  an  experiment  testing  infant  knowledge  of  gravity.  (Figure  7.9  from 
Spelke  (1988).)  The  conjecture  was  that  infants  habituated  to  sequence  (a)  would  dishabituate 
more  to  sequence  (c)  than  to  sequence  (b),  since  sequence  (c)  depicts  an  unsupported  object.  This 
cxp"''ted  result  was  not  exhibited  by  four-month-old  infants. 
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from  chapter  7  is  correct,  aiici  the  meanings  of  so  many  everyday  simple  spatial  motion  verbs  deptnul  on 
the  concepts  of  gravity  and  suiiport  ,  then  the  knowledge  of  gravity  and  sup|>ort  must  precede  the  onset 
of  language  acquisition. 

Spelke  reports  a  fourth  e.xperiment,  done  jointly  with  Kestenbaum.  that  gives  evidence  that  by 
age  four  months,  children  know  that  objects  must  obey  continuity.  This  ex|>eriment  is  depicted  in 
figure  9.29.  Two  groups  of  subjects  participated  in  this  experiment.  The  first  groui)  was  habituated 
to  the  scenario  depicted  in  figure  9.29(a).  In  this  scenario,  an  object  ])assed  behind  one  screen,  as 
it  moved  from  left  to  right,  emerged  from  behind  that  screen,  and  then  passed  behiiul  and  emerged 
from  a  second  screen.  The  second  group  was  habituated  to  a  similar  scenario  except  that  no  object 
appeared  in  the  gap  between  the  screens.  An  object  passetl  behind  one  screen  and  then  emerged  from 
the  second,  as  depicted  in  figure  9.29(b).  Both  groups  received  the  same  twodishabituation  stimuli  shown 
in  figures  9.29(c)  and  9.29(d).  One  simply  showed  a  single  object  without  the  screens  while  the  other 
showed  two  objects  without  the  screens.  The  group  habituated  to  (a)  dishabituated  more  to  (d)  while 
the  group  habituated  to  (b)  dishabituated  more  to  (c).  The  subjects  ajipear  to  attribute  scenario  (a)  to 
a  single  object  while  attributing  scenario  (b)  to  two  objects.  This  is  interpreted  as  evidence  that  by  age 
four  months,  children  know  that  objects  must  move  along  continuous  |>aths.  and  furthermore,  a  single 
object  cannot  follow  a  continuous  path  without  being  visible  in  between  the  screens. 

These  experiments  reported  by  Spelke  demonstrate  that  infants  at  a  very  early  age  possess  knowledge 
of  substantiality  and  continuity.  Furthermore,  they  use  this  knowledge  as  part  of  object  and  event 
perception.  She  offers  the  following  claim. 

The  principles  of  cohesion,  boundedness,  substance  and  spatio-temporal  continuity  appear  to 
stand  at  the  centre  of  adults'  intuitive  conceptions  of  the  physical  world  and  its  behaviour: 
our  deepest  conceptions  of  objects  appear  to  be  the  notion.s  that  they  are  internally  con¬ 
nected  and  distinct  from  one  another,  that  they  occupy  spare,  and  that  they  exist  and 
move  continuously  (for  further  discussion,  see  Spelke  1983.  1987).  These  concejUions  are  so 
central  to  human  thinking  about  the  physical  world  that  their  uniformity  sometimes  goes 
unremarked.  In  studies  of  intuitive  physical  thought,  for  example,  much  attention  is  jtaid 
to  the  idiosyncratic  and  error-ridden  predictions  adults  sometimes  make  about  the  motions 
of  objects  (e.g.  McCloskey  1983).  It  is  rarely  noted,  however,  that  adults  predict  with  near 
uniformity  that  objects  will  move  as  cohesive  wholes  on  connected  paths  through  unoccupied 
space.  This  conception,  at  least,  is  clear  and  central  to  our  thinking;  it  appears  to  have 
guided  our  thinking  since  early  infancy. 

[p.  181] 

She  then  goes  on  to  suggest  that  the  physical  knowledge  which  underlies  object  and  event  perception 
precedes  linguistic  development. 

In  this  context,  one  may  consider  the  possible  role  of  language  in  the  development  of  physical 
knowledge.  Our  research  provides  evidence,  counter  to  the  views  of  Quine  ( 1960)  and  others, 
that  the  organization  of  the  world  into  objects  precedes  the  development  of  language  and 
thus  does  not  depend  upon  it.  I  suspect,  moreover,  that  langtiage  plays  no  important  role  in 
the  spontaneous  elaboration  of  physical  knowledge.  To  learn  that  objects  tend  to  move  at 
smooth  speeds,  for  example,  one  need  only  observe  objects  and  their  motions;  one  need  not 
articulate  the  principles  of  one's  theory  or  communicate  with  others  about  it. 

[p.  181] 

.Spelke's  work  attempts  to  refute  the  claim  that  linguistic  ability  is  needed  to  formulate  physical  knowl¬ 
edge.  This  thesis  carries  Spelke's  argument  one  step  further.  It  suggests  that  physical  knowledge  is 
needed  to  formulate  linguistic  concepts. 
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(a)  Continuous  habituation  stimulus 


(b)  Discontinuous  habituation  stimulus 


-I 

(c)  One-object  dishabituation  stimulus 


I 


i 


(d)  Two-object  dishabituation  stimulus 


Figure  9.29:  Displays  for  an  experiment  demonstrating  infant  knowledge  of  continuity.  (Figure  7.10 
from  Spelke  (1988).)  Infants  habituated  to  sequence  (a)  dishabituate  more  to  sequence  (d)  than 
to  sequence  (c),  while  infants  habituated  to  sequence  (b)  dishabituate  more  to  sequence  (c)  than 
to  sequence  (d).  This  is  interpreted  as  evidence  that  five-month-old  children  have  knowledge  that 
sequence  (a)  involves  the  continuous  motion  of  one  object,  while  sequence  (b)  must  involve  two 
objects. 
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9.6  Summary 

In  chapter  7.  I  argueii  that  the  notions  of  support,  contact,  ami  atiaclnnent  play  a  central  role  in  the 
definitions  of  simple  spatial  motion  verbs  such  as  throu.  pick  up.  put.  and  ualk.  In  chapter  8,  i  juesented 
a  theory  of  how  these  notions  can  be  grounded  in  perception  via  counterfactuai  simulation.  simple 
formulation  of  this  theory  has  been  implemented  as  a  computer  program  called  .\BlCi.VlI.  that  watches 
movies  constructed  out  of  line  segments  and  circles  and  uses  counterfactuai  simulation  to  produce  de¬ 
scriptions  of  the  objects  depicted  in  those  movies,  along  with  the  changing  status  of  support,  contact, 
and  attachment  relations  between  those  objects,  in  this  chapter  1  have  argued  that  counterfactuai  simu¬ 
lation  is  performed  by  a  modular  imagination  capacity  which  directly  encodes  naive  physical  knowletige 
such  as  the  substantiality  continuity,  gravity,  ajid  groimd  j)lane  constraints.  1  have  argued  that  by 
being  based  on  these  principles,  the  human  imagination  capacity,  operates  in  a  verv  different  fashion 
from  conventional  kinematic  simulators.  The  incremental  stepwise  behavior  of  traditional  kinematic 
simulators  is  both  slow  and  cognitively  implausible  since  it  does  not  faithfully  reflect  the  substantiality 
and  continuity  constraints.  This  chapter  has  presented  an  alternate  simulation  meclianism.  which  for 
a  limited  class  of  mechanisms,  can  directly  predict  in  a  single  step  that  objects  fall  along  continuous 
paths  until  they  collide  with  obstacles  in  their  path  of  motion.  This  mechanism  appears  better  suiteil 
to  the  task  of  recovering  support,  contact,  and  attachment  relations  since  the  recovery  of  these  relations 
appears  to  be  based  more  on  collision  detection  than  on  physical  accuracy.  Perhaps  that  is  why  hu¬ 
man  visual  perception  is  more  sensitive  to  the  notions  of  substantiality  ami  continuity  than  to  velocity, 
momentum,  and  acceleration.  While  these  mechanisms  have  to  ilate  been  implemented  only  for  the 
drastically  simplified  ontology  of  .\biga1l's  micro-world,  it  appears  that  similar,  though  probably  much 
more  complex  variants  of  these  mechanisms  form  the  basis  of  the  imagination  caj)acity  which  drives 
human  visual  perception.  Extetiding  the  mechanisms  explored  with  .Abigail  to  deal  with  more  comi)lex 
world  ontologies  remains  for  future  work. 


Chapter  10 

Conclusion 


10.1  Related  Work 

Computer  models  of  event  perception  are  not  new.  A  number  of  previous  attempts  at  producing  event 
descriptions  from  animated  movies  have  been  described  in  the  literature.  Thibadeau  (1986)  describes 
a  system  that  processes  the  movie  created  by  Heider  and  Sinunel  (1944)  and  determines  wlien  events 
occur.  The  Heider  and  Simmel  movie  depicts  two-dimensional  geometric  objects  moving  in  a  plane, 
When  viewing  that  movie,  most  people  project  an  elaborate  story  onto  the  motion  of  abstract  objects. 
Thibadeau 's  system  does  not  classify  event  types.  It  just  produces  a  single  binary  function  over  time 
delineating  when  an  'event'  is  said  to  have  occurred.  Badler  (1975)  describes  an  unimplemented  strat¬ 
egy  for  processing  computer-generated  animated  line  drawings  to  recover  event  descriptions.  Badler  s 
proposed  system  hierarchically  recognizes  predicates  which  are  true  over  successively  longer  segments 
of  the  movie.  His  proposed  system  does  not  incorporate  counterfactual  simulation.  The  lowest  level 
predicates  are  computed  geometrically  on  figures  in  a  single  frame  of  the  movie.  He  thus  does  not  have 
accurate  methods  for  deriving  support,  contact,  and  attachment  relations  bet  ween  objects.  Adler  ( 1977). 
Tsotsos  (1977),  Tsotsos  and  Mylopoulos  (1979).  Tsuji  et  al.  (1977).  Tsuji  et  al.  (1979).  Okada  (1979). 
and  Abe  et  al.  (1981)  describe  systems  similar  to  Badler  s.  Again  these  systems  do  not  incorporate 
counterfactual  simulation  and  do  not  derive  support,  contact,  and  attachment  relations  between  objects. 
Novak  and  Buiko  (1990)  describe  a  system  for  interpreting  drawings  depicting  physics  problems.  Their 
system  uses  th  ■  linguistic  description  of  the  problem  as  an  aid  to  the  process  of  understanding  the  im¬ 
age.  It  cannot  correctly  interpret  the  image  without  the  help  of  the  linguistic  description  and  thus  unlike 
Abigail,  cannot  be  used  as  a  model  of  the  event  perception  mechanism  that  provides  the  non-linguistic 
input  to  the  language  acquisition  device. 

Kinematic  simulation  is  also  widely  discussed  in  the  literature,  though  it  has  never  been  applied  to 
the  task  of  event  perception.  While  most  of  the  work  falls  within  the  classic  approach  of  numerical 
integration,  two  notable  exceptions  are  the  work  of  Kramer  and  Funt. 


10.1.1  Kramer 

Kramer  (1990a,  1990b)  discusses  a  kinematic  simulator  called  TLA.  Like  this  thesis,  Kramer  eschews 
the  classic  approach  based  on  numerical  integration  in  favor  of  a  more  closed-form  solution.  He  does  so. 
however,  for  recisons  of  efficiency.  Kramer  is  not  concerned  wdth  cognitive  modeling  and  plausibility.  Like 
Abigail,  tla  ignores  dynamics.  This  includes  velocity,  momentum,  kinetic  energy,  and  the  magnitude 
of  forces  acting  on  components. 

On  one  hand,  TLA  is  substantially  more  powerful  than  Abigail.  Besides  simulating  three-dimensional 
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movement  const  rained  hy  a  wide  variety  of  joint  types.  TL.\  is  aide  to  liandle  closed-loop  kinematic  cliain.s. 
Kramer  jiresents  T1..V  simulating  a  number  of  comple.v  mechanisms  inchuling  a  sofabed  It  does  so  by 
constructing  what  Kramer  calls  an  asseinhly  plan,  a  procedure  for  incrementally  satisfying  the  joint 
constraints  of  a  mechanism,  one  by  one.  in  a  fashion  which  is  analogous  to  a.ssembling  th<'  mechanism  in 
a  given  configuration.  When  a  mechanism  contcins  a  clo.sed-loop  kinematic  chain,  .liere  are  constraints 
between  the  values  of  its  joint  parameters.  Some  indepeixient  set  of  joint  parameters  is  taken  as  the 
driving  inputs  so  that  the  values  of  the  remaining  tiepeinlent  joint  parameters  is  unitiuely  determined 
given  particular  values  for  those  inputs.  An  assembly  plan  is  thus  a  procedure  for  computing  the  \alues 
of  dependent  joint  parameters  frotii  these  driving  ini>uts.'  i  la  operates  by  repeatedly  a.ssembling  a 
mechanism  for  different  values  of  the  driving  in|>uts.  Wlien  a  mechanism  iloes  not  contain  any  closed- 
loof>  kinematic  chains,  its  as.sembly  plan  is  trivial.  All  of  its  fle.xible  joint  parami'ti'rs  arc  driving  in]>uts 
and  none  are  computed  as  dependent  results.  In  essence  .ABKiAIL  haiulles  just  this  simple  case.  The 
novel  contribution  of  TLA  is  an  algorithm  for  deriving  a.ssembly  plans  for  mechanisms  with  closed-loo|) 
kinematic  chains. 

On  the  other  hand,  Abigail  addresses  i.ssiies  that  do  not  concern  Kramer.  Even  ignoring  dynamics, 
the  motion  of  objects  must  obey  a  number  of  constraints  in  addition  to  those  imposed  by  joints.  These 
include  substantiality,  continuity,  gravity,  and  ground  plane,  none  of  which  are  handled  by  ri.A,  in 
essence,  TLA  is  an  extremely  sophisticated  and  competent  analog  of  the  inner  loop  of  .Abigail’s  simulator 
which  moves  the  foreground  relative  to  the  background.  In  .Abigail  this  inner  loop  is  trivial  since'  she 
does  not  deal  with  closed-loop  kinematic  chains.  The  focus  in  .Abigail  howi'ver,  is  on  what  is  built  on  top 
of  this  inner  loop — the  mechanism  for  repeatedly  choosing  a  foreground,  deciding  whether  to  translate 
of  rotate  this  foreground,  determining  an  appropriate  translation  axis  0  or  pivot  point  />.  and  most 
importantly  analytically  ddernuning  how  far  to  tran.'ilati  or  rotati  ihi  fonground  along  that  translation 
ans  or  pivot  point  until  potential  energy  would  increase  or  substantiality  would  he  violated.  This  is  oni' 
novel  contribution  of  the  kinematic  simulator  incorporated  into  .Abigail,  apart  from  all  of  the  higher- 
level  mechanisms  which  use  that  simulator  to  support  event  perception  and  the  grounding  of  language 
in  perception. 

One  may  consider  merging  the  two  ideas  togetiier  in  an  attempt  to  allow  .Abigail  to  understand 
images  that  contain  closed-loop  kinematic  chains.  This  is  actually  much  more  complicated  than  it 
would  seem  at  first  glance.  In  .Abigail's  ontology,  all  motion  follows  either  linear  or  circular  |taths. 
Furthermore,  all  objects  are  constructed  front  line  segments  and  circles.  Thus  all  motion  limits  can  be 
found  by  computing  the  intersection  of  lines  and  circles.  This  is  conceittually  straightforward  di'spite 
the  myriad  of  ca.ses,  subcases,  and  boundary  cases  which  must  be  considen'd  to  make  it  work.  As  the 
driving  inptits  of  a  mechanism  with  closed-loop  kinematic  chains  are  varied,  however,  their  comiionents 
follow  paths  which  are  substantially  more  complex  as  they  move.  Merging  TLA  and  Abigail  would  first 
require  that  TLA  compute  a  representation  of  the  path  a  point  on  an  object  would  follow  as  a  result 
of  varying  a  driving  input,  (’urrently,  TLA  does  not  compute  such  representations.  It  oidy  rom|Mites 
individual  positions  along  the  path  given  particular  values  for  the  driving  inputs.  Even  if  an  explicit 
representation  of  paths  were  produced,  two  further  capabilities  are  needed  to  incorporate  such  a  capacity 
into  the  simulation  framework  discussed  in  section  9.1.  First,  a  method  is  needed  to  compute  how  far 
one  can  vary  a  driving  input  while  still  decreasing  potential  energy.  Second,  a  method  is  needed  for 
intersecting  arbitrary  paths.  My  guess  is  that  this  would  be  a  substantial  endeavor. 

It  is  not  clear  that  such  an  effort  would  be  worthwhile.  People  might  not  have  the  ability  to  accurately 
simulate  complex  mechanisms  as  part  of  the  hypothesized  imagination  capacity.  While  they  clearly 
can  predict,  at  least  at  a  gross  level,  the  behavior  of  mechanisms  such  as  the  ones  in  figures  9.211, 
9.24,  and  9.25,  they  might  do  so  by  .some  approximation  method  which  removes  the  clo.sed-loop  kiiu'inatic 

'  A  given  set  of  joint  parameters  may  be  sulficieni  for  uni<|iielv  determining  a  merlianism's  ronfigurat  ion  for  some  values 
of  the  peirameters  but  not  others.  Thus  an  assembl.v  plan  mu.st  be  flexible  about  which  joint  parameters  it  takes  as  rlriving 
inputs  emd  which  peirameters  it  returns  as  computed  results. 
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chains.  How  tliis  may  he  done  is  a  topic  for  future  research. 

10.1.2  Funt 

Flint  (1980)  describes  a  system  called  \^’HISPER  tliat  sliares  many  of  tlu'  same  ABKi.All  s 

imagination  capacity.  Like  Abigail,  Whisper  can  determine  the  su|>porl  relationships  ht'twt'en  olijects 
in  a  static  image.  Whisper  can  also  predict  the  sequence  of  events  that  will  occur  during  the  collapsi* 
of  a  pile  of  objects  depicted  in  a  static  image.  Whisper  differs  from  .\BIgaii.  in  one  key  detail  however. 
While  Abigail  represents  images  as  collections  of  line  segments  and  circles.  Whisper  instead  represents 
images  as  bitmaps.  Thus,  unlike  Abigail.  Whisper  can  represent  and  operate  on  images  containing 
arbitrarily  shaped  objects. 

Whisper  maintains  two  distinct  bitmap  representations  of  each  image.  One  uses  a  convi  ntional 
rectilinear  layout  of  pixels.  Funt  calls  this  representation  the  <iiagram.  The  other  uses  a  concentric 
layout  of  pixels  which  Funt  calls  the  retina.  Various  transformation  o|ierations  can  be  performed  on 
an  image  in  each  representation.  For  example,  objects  in  the  diagram  may  be  translated  or  rotatid. 
a  process  which  Funt  calls  redrawing  the  diagram.  The  concentric  layout  of  the  retina  rejiresentation 
supports  a  number  of  efficient  transformations,  in  particular  rotation  about  the  center  of  the  retina.  Funt 
allows  the  diagram  representation  to  be  converted  to  the  retina  rei>re,sentation  but  not  vice  versa.  This 
process,  calletl  fixation,  can  specify  a  point  in  the  diagram  to  be  aligned  with  the  center  of  the  retina. 
Higher-level  processes  request  sequences  of  fixation  and  transformation  oiierations.  'I'hese  processes 
can  also  perform  a  number  of  query  operations  on  the  retina  representation.  Direct  queries  on  tin' 
diagram  are  not  sup|>orted.  In  addition  to  rotation  about  its  center,  the  concentric  layout  of  the  retina 
representation  supports  several  other  efficient  query  operations.  These  include  computing  the  center- 
of-area  of  an  object,  finding  the  points  of  contact  between  two  objects,  examining  curves  to  find  points 
of  abrupt  change  in  slope,  determining  whether  an  object  is  symmetric,  and  determining  whether  two 
objects  have  the  same  shape. 

The  higher-level  supervisory  processes  determine  support  relationships  and  perform  the  simulation  by 
issuing  a  sequence  of  transformations,  fixations,  and  queries  on  the  diagram  and  retina  repre.sentations. 
In  this  respect  Whisper  is  very  similar  to  .Abigail.  Both  .Abigail  and  Whisper  ignore  dynamic  effects 
of  velocity,  acceleration,  momentum,  moment  of  inertia,  and  kinetic  energy  during  the  simulation.  Both 
assume  that  objects  have  a  uniform  density  which  allows  equating  center-of-mass  with  center-of-area. 
More  importantly,  both  perform  simulation  by  a  sequence  of  single  object  translations  and  rotations, 
ignoring  the  possibility  for  simultaneous  movement  of  multiple  objects.  Besides  the  inherent  physical 
inaccuracy  cau-sed  by  this  approach  to  simulation.  Whisper,  like  .Abigail,  is  unable  to  simulate  scenarios 
with  closed-loop  kinematic  chains. 

Though  Whisper  is  similar  in  intent  to  Abigail,  and  shares  many  of  the  same  underlying  a.ssump- 
tions  and  problems,  W'hisper  also  differs  from  Abigail  in  a  number  "^f  key  respects.  First,  a.s  discussed 
previously.  Whisper  uses  a  bitmap  representation  while  .Abigail  uses  an  edge-based  representation. 
Second.  Whisper  only  performs  simulations  and  determines  support  relationships.  It  does  not  perform 
the  higher-level  tasks  of  event  perception  which  in  Abigail  are  built  around  the  ability  to  perform 
such  simulations  by  the  methods  described  in  chapter  8.  Third.  Whisper's  ontology  is  strictly  two- 
dimensional.  It  lacks  any  notion  of  a  third  dimension,  even  a  restricted  one  such  as  the  concept  of 
‘layer’  incorporated  into  .ABIGAIL.  Furthermore.  Whisper's  ontology  does  not  include  the  capability 
for  objects  to  be  fastened  together  by  joints.  Since  its  ontology  lacks  joints  and  layers,  it  has  no  need  to 
infer  such  information  from  the  image  and  thus  has  no  analog  to  the  model  updating  proce.ss  described 
in  section  8.2.1.  A  fourth  and  more  significant  difference  between  .Abigail  and  Whisper  is  that  while 
Abigail  can  determine  analytically  in  a  single  step,  the  maximal  rotation  or  translation  an  object  can 
undergo  subject  to  substantiality  constraints.  Whisper  operates  more  like  a  conventional  simulator, 
repeatedly  performing  small  transformations  and  checking  for  collisions  after  each  transformation. 
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It  Is  lntt>r<“st iiig  to  note  tliat  Whispk.R  iiicorporat*-^  a  iiiiiiiht'i'  ol’tlio  saiiu'  lif'uristics  as  ABICiAll 
that  liiiiit  tile  clioicf  of  |)ivot  points  aiul  translation  axos.  Fnrt In'iiiiorc.  W  iiisPKH  utilizes  a  notion  of 
ronglomerat ion  ainal;>,aniatinj>  several  ohjects  toj>etlier  for  colleetive  analysis  of  support  relations  a 
concept  which  is  analogous  to  that  of  clusters.  I  nlike  ABtCi.\tl..  W'liisPKR  deii'miines  whether  ati  ohject 
is  supported  withoitt  actually  iittaginitig  it  falling.  I>y  exatiiitiing  the  relative  positiotts  of  ati  ohject  s 
cetiter-of-tna.ss  attd  its  support  points.  This  tiiethod  allows  Whispkr  to  deiertiiittt  sttpport  relattotiships 
for  sotiie,  hut  tiot  all,  sitttatiotts  where  ABKi.Vlt.  would  fail  iltie  to  itnplied  closed-loo|i  kinetiiattc  chaitis 


10.2  Discussion 

for  pedagogical  purposes,  part  II  of  this  th«*.sis  has  taken  ait  exireiiie  position  on  the  representation 
of  verb  meanings.  Chapter  7  ha.s  exaggerated  the  role  played  h\  the  tuitions  of  support,  contact,  and 
attachment  in  order  to  motivate  the  event  perception  mechanisms  prestmted  in  chapter  iS  and  !h  In 
doing  so,  it  downplayed  the  notiotis  of  causality  and  force  apjdicat ioti  which  most  prior  ajiproaches 
to  lexical  .semantic  repre.sentation  (e.g.  .Miller  1!)72,  Schank  l})7d.  Jacki'iidoff  IflSd.  and  Pinker  IflSil) 
have  taken  to  he  central  to  verb  definitiotis.  This  thesis  <loes  not  claim  that  the  notions  of  stipjuiri. 
contact,  and  attachment  are  sufficietit  to  define  verb  meanings.  Causality  attd  force  application,  as 
well  as  numerous  other  tiotiotis,  are  needed  to  characte’ .ze  word  tneanings  iti  general,  hu  alone  the 
tneanings  of  simple  spatial  motioti  verbs.  .Most  of  the  words  tiefineil  in  chapter  7  (e.g  Ihrou.  fall.  (Iioj). 
bouncf.  jump.  pul.  pick  up.  carry.  rai.s(.  mak(.  bnak.  fix.  step,  and  walk)  have  clear  causal  cotniioitents 
even  though  the  definitions  giveit  there  were  able  to  circumvent  th*'  tnvd  for  describing  this  ctttisal 
component  by  sufficiently  characterizing  the  non-causal  aspects  of  the  meanings  of  these  vt^rbs.  namely 
the  support,  contact,  and  attaclnnent  relations  tliey  engender  Ix'tweeti  objects  participating  in  events 
that  they  describe.  This  ability  for  igtiorittg  the  causal  component  of  verb  meanings  ftroke  down  for 
verbs  like  roll  and  slide  in  their  tratisitive  uses.  Thus  ultimately  it  will  be  necessary  to  incorporati' 
causality  into  a  cotnitrehensive  lexical  setiiantic  represent  at  ioti.  Doitig  .so  will  require  an  exi>lanation  of 
how'  to  ground  the  notion  of  causality  iti  perception. 

It  may  be  possible  to  extend  the  techniques  described  iti  chapters  t<  and  it.  tiatnely  counterfactual 
simulation,  to  support  the  perception  of  causality  and  force  application.  In  essence,  an  object  .1  can 
be  said  to  cause  an  event  e  if  f  does  actually  happen  in  the  observed  world  btil  does  not  happt'ii  iti  an 
imagined  world  where  ,4  either  does  not  exist  or  moves  differi'iitly  than  in  the  observed  world.  Imagining 
an  alternate  world  without  .4  can  be  accomplished  using  existing  mechanisms  in  .AHlft.-xil,.  The  notion 
of  'moving  differently’,  however,  retpiires  extending  .AbigaII.'s  ontology  to  siijiport  animati*  objt'cts. 
Animate  (or  at  least  motile)  objects  are  those  which  appear  to  move  on  their  own  initiative.  Such 
motion  occurs  because  parts  of  animate  objects  exert  forces  relative  to  other  parts.  Within  the  limited 
ontology  of  Abk;ail's  micro-w-orld.  such  relative  motion  of  animate  object  parts  cottld  be  modeled 
completely  using  joints  which  exert  forces  to  change  their  parameters,  ('urrently,  gravity  is  tin*  only- 
force  incorporated  into  .Abigail.  .Abigail  could  be  extended  to  model  joint  forces  iti  addition  to 
gravity.  This  w’ould  require  .several  changes.  First,  the  joint  model  maintained  by  .ABKiAlL  must  be 
extended  to  contain  a  representation  of  the  changing  forces  exerted  by  eaf-h  joint  as  a  ftinction  of 
time.  The  changing  force  profile  of  the  joints  comprising  an  object  .4  can  be  said  to  be  the  motor 
program  executed  by  .4.  To  model  grasping  and  releasing,  the  motor  progratn  must  have  the  rapacity 
for  representing  the  creation  and  dissolution  of  joints  in  addition  to  changing  forci'  profiles.  Secotid. 
the  imagination  capacity  must  be  extended  to  take  a  motor  program  as  itiput  in  additioti  to  a  set  of 
figures,  a  joint  model,  and  a  layer  model,  .'such  an  extended  imagination  capacity  would  model  (In' 
short-term  future  of  the  world  under  the  effects  of  gravity  a.ssuming  that  each  animate  obji-ct  ('X<'cule<l 
the  motor  program  given  as  input.  Modeling  the  execution  of  motor  programs  would  require  a  kiin'inatic 
simulator  that  was  more  faithful  to  the  time  course  of  simulation  than  the  simulator  curi’'ntlv  used  In 
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Abk;aII..  riiinl.  sincf  tlir  motor  |)ro;irams  fxi'culod  hy  aiiimatt'  olijt-cts  in  the  world  ari'  not  direct  1\ 
ohservalilic  Abicaii,  must  he  jirovided  witli  mechanisms  lr)r  hypotliesizim;  these  motor  programs.  Such 
meclianisms  would  he  analogous  to  those  currently  list'd  )>y  ABKiAII.  lor  npdalmg  her  joint  ami  la\i  r 
motlels.  lotor  programs  could  he  recoveretl  hy  connit'rractnal  simulation.  Informallw  ,\Bl(i.\ll,  wonhl 
incorporate  into  tin-  hypot  ln'sized  motor  programs  only  those  force  applications  which  wt'ia  iieeih  I  to 
liave  the  imagined  world  match  the  oltserveil  world,  f  inally.  a  (irnnitive  (cause  .1  >  )  could  hi  uhletl 
to  the  lexical  semantic  rt'prt'sentalion  dt'scrihed  in  chapter  7.  Actually,  there  seem  to  In  at  h  ast  ihret 
tiistinrt  notions  of  causality.  The  first  ('X|)res.ses  the  fact  that  the  existence  of  an  ohject  .1  caiiseil  an 
I'vent  (  .  Such  a  causal  relation  is  true  if  <  occurs  in  the  ohservt'd  worhl  hiit  tloes  not  occur  in  a  world 
imagineil  without  x.  (liven  this  notion  of  causality,  the  two  argument  primitive  (supports  x  ;/)  can 
hi'  reformulated  as  (cause  x  (supported  //)).  The  second  expresses  the  fact  that  the  motion  i.i  an 
animate  ohject  x.  namely  the  motion  caused  hy  the  exi'cntion  of  its  motor  |>rogram.  caused  an  e\i  iii  < 
Such  a  causal  relation  is  true  if  r  occurs  in  the  olcserved  wo,  Id  hut  does  not  occur  in  a  world  imagined 
where  X  does  not  e.cecute  its  motor  program.  During  such  connierfact  ual  simulation,  x  would  keep  rigid 
all  of  the  joints  which  it  would  have  moved  according  to  tlie  motor  program  recovered  from  the  ohser\ed 
world.  The  third  variant  of  causality  expres.ses  the  fact  that  the  involuntary  motion  of  an  ohject  x  caused 
an  event  <.  Such  involuntary  motion  occurs  not  hecaiis''  of  a  motor  program  executed  hy  x  hut  rather 
as  a  result  of  either  gravity,  a  motor  program  executed  hy  some  other  ohject.  or  a  comhitiat ion  of  the 
two. 

Putting  these  siieciilative  ideas  aside,  there  are  several  imi>ortant  areas  of  contintied  work  .along 
the  maiti  themes  advanced  in  part  II  of  this  thesis,  first,  to  date  .\BtCiAll.  has  otily  processed  a  por- 
tioti  of  a  single  movie,  .\ildit iottal  work  is  needed  to  improve  the  rohustni'ss  and  perfortnance  of  the 
imagitiat ioti  capacity  and  e\ent  percejitioti  mechatiisms  to  allow  .Abkjaii.  to  successfully  process  many 
movies,  Secoinl,  .Abicjail  ctirretitly  does  not  produce  complete  .setnantic  descript iotis  of  event  such  as 
those  presented  in  chapter  7.  While  she  does  recover  perceptual  primitives,  including  the  notions  of 
suitport.  cotitact,  and  attachtnent.  she  does  not  .iggregate  these  primitives  into  event  ex|>ressions.  It 
would  he  fairly  straightforward  to  iticorporate  a  lexicon  of  event  expressions  into  .Xbioaii,  atul  have  her 
continually  a.ssess  which  of  these  ktiowti  evi'tit  types  wen*  currently  ha|)pening  in  the  movie.  .A  numher 
of  prior  approaches  to  event  perception  (e.g.  Badler  l!)7'))  utilized  such  a  lexicon  of  event  types.  ,A  more 
satisfying  approach  would  not  ri'ly  oti  a  predefined  set  of  event  types  hut  instead  would  he  alile  to  learn 
the  appropriate  e\etit  lexicoti.  Ihe  event  lexicon  might  he  acipiired  hy  noticing  recurring  seipietices  of 
perceptual  primitives  in  the  movie.  Alternatively,  there  may  he  utiiversal  ainl  perhaps  innate  priiicijdes 
that  govern  the  aggregatioti  of  perceptual  ])rimitives  into  discrete  events.  Discerning  the  nature  of  such 
principles  atul  testing  their  validity  hy  huilding  computational  models  awaits  further  research.  Finally. 
.Abigail  is  currently  not  integrated  with  any  language  processing  facility.  The  original  goal  that  mo¬ 
tivated  the  work  on  event  |)erce(ition  described  in  part  II  of  this  thesis  was  the  desire  to  groutid  the 
language  accpiisition  task  advanced  in  part  I  in  a  realistic  le.xical  semantic  representation  which  could 
be  shown  to  be  recoverable  frotu  visual  input.  In  order  to  attempt  the  ittlegration  of  'he  two  halves  of 
this  thesis  it  is  first  necessary  to  surces.sfully  accomplish  the  first  two  tasks  outlined  above.  Additionally, 
one  must  formulate  a  suitable  linking  rule  for  the  semantic  representation  i)roduced  hy  the  aggregation 
process  described  above.  This  linking  rule  must  then  be  inverted  in  a  fashion  similar  to  the  way  the 
Jackendovian  liking  rule  was  inverted  in  section  .'{.1.  Tliis  inverted  linking  rule  could  then  be  combined 
with  a  hybrid  language  accpiisition  model  based  on  the  syntactic  theory  of  Kk.M'MA  but  utilizing  a 
more  elaborate  .semantic  representation  with  a  fracturing  rule  along  the  lines  of  Maimra  and  Davra. 
The  substantial  effort  of  building  such  a  '■omprehensive  computational  model  of  language  acquisition 
remains  for  future  work.  Nonetheless,  this  thesis  has  taken  a  modest  first  in  this  direction  by  elaborating 
a  framework  for  a|)proaching  this  task  and  demonstrating  detailed  working  implementations  of  a  numher 
of  crucial  com|>onents  that  will  ultimately  be  needed  to  construct  such  language'  acquisition  models. 
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Appendix  A 


Maimra  in  Operation 


This  appendix  contains  a  trace  of  Maimra  processing  the  corpus  from  figure  1:2  using  tlie  grammar 
from  figure  4.1.  The  final  lexicon  produced  for  this  run  is  illustrated  in  figure  AM.  This  trace  depicts 
Maimra  processing  the  corpus,  utterance  by  utterance,  producing  first  a  disjunctive  parse  tree  for  each 
utterance  and  then  a  disjunctive  lexicon  formula  for  that  utterance. 

Ics:  (OR  (BE  PERSOHl  (AT  PERS0H3)) 

(GO  PERSOHl  (PATH  (FROM  PERS0H3)  (TO  PERS0H2))) 

(GO  PERSOHl  (FROM  PERS0H3)) 

(GO  PERSOHl  (TO  PERS0H2)) 

(GO  PERSOHl  (PATH)) 

(BE  PERSOHl  (AT  PERS0H2))) 
s  ent  enc  e :  ( J  OHH  ROLLED ) 
parse;  (S  (HP  (H  JOHH))  (VP  (V  ROLLED))) 
fracture : 

(OR  (AHD  (DEFIHITIOH  JOHH  H  PERS0H3) 

(DEFIHITIOH  ROLLED  V  (BE  PERSOHl  (AT  ?0)))) 

(AHD  (DEFIHITIOH  JOHH  H  (AT  PERS0H3)) 

(DEFIHITIOH  ROLLED  V  (BE  PERSOHl  ?0))) 

(AMD  (DEFIHITIOH  JOHH  H  PERSOHl) 

(DEFIHITIOH  ROLLED  V  (BE  ?0  (AT  PERS0H3)))) 

(AHD  (DEFIHITIOH  JOHH  H  PERS0H2) 

(DEFIHITIOH  ROLLED  V  (GO  PERSOHl  (PATH  (FROM  PERS0H3)  (TO  ?0))))) 

(AHD  (DEFIHITIOH  JOHH  H  (TO  PERS0H2)) 

(DEFIHITIOH  ROLLED  V  (GO  PERSOHl  (PATH  ?0  (FROM  PERS0H3))))) 

(AHD  (DEFIHITIOH  JOHH  H  PERS0H3) 

(DEFIHITIOH  ROLLED  V  (GO  PERSOHl  (PATH  (FROM  ?0)  (TO  PERS0H2))))) 

(AHD  (DEFIHITIOH  JOHH  H  (FROM  PERS0H3)) 

(DEFIHITIOH  ROLLED  V  (GO  PERSOHl  (PATH  ?0  (TO  PERS0H2))))) 

(AHD  (DEFIHITIOH  JOHH  H  (PATH  (FROM  PERS0H3)  (TO  PERS0H2))) 

(DEFIHITIOH  ROLLED  V  (GO  PERSOHl  ?0))) 

(AHD  (DEFIHITIOH  JOHH  H  PERSOHl) 

(DEFIHITIOH  ROLLED  V  (GO  ?0  (PATH  (FROM  PERS0H3)  (TO  PERSOH2))))) 

(AHD  (DEFIHITIOH  JOHH  H  PERS0H3) 

(DEFIHITIOH  ROLLED  V  (GO  PERSOHl  (FROM  ?0)))) 

(AHD  (DEFIHITIOH  JOHH  H  (FROM  PERS0H3)) 
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(DEFIHITIOH  ROLLED  V  (GO  PERSOMl  ?0))) 

(AID  (DEFIMITION  JOHN  1  PERSOMl) 

(DEFIHITIOM  ROLLED  V  (GO  ?0  (FROM  PERSONS)))) 
(AND  (DEFINITION  JOHN  I  PERS0I2) 

(DEFINITION  ROLLED  V  (GO  PERSONl  (TO  ?0)))) 
(AMD  (DEFINITION  JOHN  N  (TO  PERS0N2)) 

(DEFINITION  ROLLED  V  (GO  PERSOMl  ?0))) 

(AND  (DEFINITION  JOHN  N  PERSONl) 

(DEFINITION  ROLLED  V  (GO  ?0  (TO  PERS0N2)))) 
(AND  (DEFIMITION  JOHN  I  (PATH)) 

(DEFINITION  ROLLED  V  (GO  PERSOMl  ?0))) 

(AID  (DEFINITION  JOHN  N  PERSONl) 

(DEFIMITION  ROLLED  V  (GO  ?0  (PATH)))) 

(AID  (DEFINITION  JOHN  N  PERS0N2) 

(DEFINITION  ROLLED  V  (BE  PERSONl  (AT  ?0)))) 
(AND  (DEFINITION  JOHN  N  (AT  PERS0N2)) 

(DEFINITION  ROLLED  V  (BE  PERSONl  ?0))) 

(AND  (DEFINITION  JOHN  N  PERSONl) 

(DEFINITION  ROLLED  V  (BE  ?0  (AT  PERS0H2))))) 
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Ics:  (OR  (BE  PERS0I2  (AT  PERS0K3)) 

(GO  PERS0H2  (PATH  (FROM  PERS0M3)  (TO  PERSOIl))) 

(GO  PERS01f2  (FROM  PERS0I3)) 

(GO  PERS0M2  (TO  PERSOIl)) 

(GO  PERS0M2  (PATH)) 

(BE  PERS0M2  (AT  PERSOIl))) 
sentence:  (MARY  ROLLED) 
parse:  (S  (HP  (I  MARY))  (VP  (V  ROLLED))) 
fracture:  (OR  (AID  (DEFIIITIOI  MARY  I  PERS0I2) 

(DEFIHITIOH  ROLLED  V  (BE  ?0  (AT  PERS0H3)))) 
(AID  (DEFIIITIOI  MARY  I  PERS0I2) 

(DEFIIITIOI  ROLLED  V  (GO  ?0  (FROM  PERS0I3)))) 
(AID  (DEFIIITIOI  MARY  I  PERS0I2) 

(DEFIIITIOI  ROLLED  V  (GO  ?0  (PATH))))) 
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Ics;  (OR  (BE  PERSOH3  (AT  PERSOHl}) 

(GO  PERSONS  (PATH  (FROM  PERSOMl)  (TO  PERS0I2))) 
(GO  PERSONS  (FROM  PERSONl)) 

(GO  PERSONS  (TO  PERS0N2)) 

(GO  PERSONS  (PATH)) 

(BE  PERSONS  (AT  PERS0N2))) 
sentence:  (BILL  ROLLED) 
parse:  (S  (NP  (N  BILL))  (VP  (V  ROLLED))) 
fracture:  (AND  (DEFINITION  BILL  N  PERSONS) 

(DEFINITION  ROLLED  V  (GO  ?0  (PATH)))) 


2oy 

Ics:  (OR  (BE  OBJECTl  (AT  PERSOID) 

(GO  OBJECTl  (PATH  (FROM  PERSOIl)  (TO  PERS01I2))) 

(GO  OBJECTl  (FROM  PERSOMl)) 

(GO  OBJECTl  (TO  PERS0M2)) 

(GO  OBJECTl  (PATH)) 

(BE  OBJECTl  (AT  PERS0I2))) 
sentence;  (THE  CUP  ROLLED) 
parse:  (OR  (S  (OR  (MP  (DET  THE)  (H  CUP)) 

(HP  (M  THE)  (MP  (M  CUP))) 

(HP  (H  THE)  (VP  (V  CUP))) 

(HP  (M  THE)  (PP  (P  CUP)))) 

(VP  (V  ROLLED))) 

(S  (HP  (H  THE)) 

(OR  (VP  (OR  (AUX  (DO  CUP)) 

(AUX  (BE  CUP)) 

(AUX  (MODAL  CUP)) 

(AUX  (TO  CUP)) 

(AUX  (HAVE  CUP))) 

(V  ROLLED)) 

(VP  (V  CUP)  (VP  (V  ROLLED)))))) 
fracture:  (OR  (AMD  (DEFIHITIOH  THE  H  OBJECTl) 

(OR  (DEFIHITIOH  CUP  HAVE  SEMAHTICLESS) 

(DEFIHITIOH  CUP  TO  SEMAHTICLESS) 

(DEFIHITIOH  CUP  MODAL  SEMAHTICLESS) 

(DEFIHITIOH  CUP  BE  SEMAHTICLESS) 

(DEFIHITIOH  CUP  DO  SEMAHTICLESS)) 

(DEFIHITIOH  ROLLED  V  (GO  ?0  (PATH)))) 

(AHD  (DEFIHITIOH  THE  DET  SEMAHTICLESS) 

(DEFIHITIOH  CUP  H  OBJECTl) 

(DEFIHITIOH  ROLLED  V  (GO  ?0  (PATH))))) 
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Ics:  (OR  (BE  PERSONS  (AT  PERSON!)) 

(GO  PERSONS  (PATH  (FROM  PERSON!)  (TO  PERS0N2))) 

(GO  PERSONS  (FROM  PERSON!)) 

(GO  PERSONS  (TO  PERS0i2)) 

(GO  PERSONS  (PATH)) 

(BE  PERSONS  (AT  PERS0N2))) 
sentence:  (BILL  RAN  TO  NARY) 
parse:  (OR  (S  (OR  (NP  (N  BILL)  (NP  (N  RAN))) 

(NP  (N  BILL)  (VP  (V  RAN))) 

(NP  (N  BILL)  (PP  (P  RAN)))) 

(VP  (V  TO)  (NP  (N  MARY)))) 

(S  (NP  (N  BILL)) 

(OR  (VP  (V  RAN)  (PP  (P  TO))  (NP  (N  MARY))) 

(VP  (V  RAN)  (VP  (V  TO))  (NP  (N  MARY))) 

(VP  (V  RAN)  (NP  (N  TO))  (NP  (M  MARY))) 

(VP  (OR  (AUX  (DO  RAN)) 

(AUX  (BE  RAN)) 

(AUX  (NODAL  RAN)) 

(AUX  (TO  RAN)) 

(AUX  (HAVE  RAN))) 

(V  TO) 

(NP  (N  MARY))) 

(VP  (V  RAM) 

(OR  (NP  (DET  TO)  (N  MARY)) 

(NP  (N  TO)  (HP  (N  NARY))))) 

(VP  (V  RAH)  (VP  (V  TO)  (NP  (H  MARY)))) 

(VP  (V  RAN)  (PP  (P  TO)  (NP  (N  MARY))))))) 


(OR  (AND  (DEFINITION  BILL  N  PERSONS) 

(OR  (AND  (DEFINITION  MARY  N  PERS0N2) 

(DEFINITION  TO  P  (TO  ?0)) 

(DEFINITION  RAN  V  (GO  ?0  (P*.TH  ?1  (FROM  PERSONl))))) 

(AND  (DEFINITION  NARY  N  PERS0N2) 

(DEFINITION  TO  P  (PATH  (FROM  PERSONl)  (TO  ?0))) 
(DEFINITION  RAN  V  (GO  ?0  ?!))) 

(AND  (DEFINITION  MARY  N  PERS0N2) 

(DEFINITION  TO  V  (TO  ?0)) 

(DEFINITION  RAN  V  (GO  ?0  (PATH  ?1  (FROM  PERSONl))))) 

(AND  (DEFINITION  NARY  N  PERS0N2) 

(DEFINITION  TO  V  (PATH  (FROM  PERSONl)  (TO  ?0))) 
(DEFINITION  RAN  V  (GO  ?0  ?!))) 

(AND  (DEFINITION  TO  DET  SEMANTICLESS ) 

(DEFINITION  MARY  N  PERS0N2) 

(DEFINITION  RAN  V  (GO  ?0  (PATH  (FROM  PERSONl)  (TO  ?!))))) 
(AND  (DEFINITION  NARY  N  PERS0N2) 

(DEFINITION  TO  N  (TO  ?0)) 

(DEFINITION  RAN  V  (GO  ?0  (PATH  ?1  (FROM  PERSONl))))) 

(AND  (DEFINITION  NARY  N  PERS0H2) 

(DEFINITION  TO  N  (PATH  (FROM  PERSONl)  (TO  ?0))) 
(DEFINITION  RAN  V  (GO  ?0  ?!))) 

(AND  (OR  (DEFINITION  RAN  HAVE  SEMANTICLESS) 

(DEFINITION  RAN  TO  SEMANTICLESS) 

(DEFINITION  RAN  MODAL  SEMANTICLESS) 

(DEFINITION  RAN  BE  SEMANTICLESS) 

(DEFINITION  RAN  DO  SEMANTICLESS)) 

(DEFINITION  MARY  N  PERS0H2) 

(DEFINITION  TO  V  (GO  ?0  (PATH  (FROM  PERSONl)  (TO  ?!))))) 
(AND  (DEFINITION  TO  N  PERSONl) 

(DEFINITION  NARY  N  PERS0N2) 

(DEFINITION  RAN  V  (GO  ?0  (PATH  (FROM  ?1)  (TO  ?2))))) 

(AND  (DEFINITION  TO  N  (FROM  PERSONl)) 

(DEFINITION  MARY  N  PERS0H2) 

(DEFINITION  RAN  V  (GO  ?0  (PATH  ?1  (TO  ?2))))) 

(AND  (DEFINITION  TO  V  PERSONl) 

(DEFINITION  NARY  N  PERSON2) 

(DEFINITION  RAN  V  (GO  ?0  (PATH  (FROM  ?1)  (TO  ?2))))) 

(AND  (DEFINITION  TO  V  (FROM  PERSONl)) 

(DEFINITION  MARY  N  PERS0N2) 

(DEFINITION  RAN  V  (GO  ?0  (PATH  ?1  (TO  ?2))))) 

(AND  (DEFINITION  TO  P  PERSONl) 

(DEFINITION  NARY  N  PERS0I2) 

(DEFINITION  RAN  V  (GO  ?0  (PATH  (FROM  ?1)  (TO  ?2))))) 

(AND  (DEFINITION  TO  P  (FROM  PERSONl)) 

(DEFINITION  MARY  N  PERSQH2) 

(DEFINITION  RAN  V  (GO  ?0  (PATH  ?1  (TO  ?2))))))) 
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(AMD  (DEFIIITIOI  BILL  H  PERSQH3) 

(OR  (AID  (DEFIIITIOI  NARY  I  PERS0I2) 

(DEFIIITIOI  TO  P  (TO  ?0)) 

(DEFIIITIOI  RAI  V  (GO  ?0  ?!)>) 

(AID  (DEFIIITIOI  NARY  I  PERS0R2) 

(DEFIIITIOI  TO  V  (TO  ?0)) 

(DEFIIITIOI  RAI  V  (GO  ?0  ?!))) 

(AID  (DEFIIITIOI  TO  DET  SENAITICLESS ) 
(DEFIIITIOI  NARY  I  PERS0I2) 

(DEFIIITIOI  RAI  V  (GO  ?0  (TO  ?!)))) 

(AID  (DEFIIITIOI  NARY  I  PERS0I2) 

(DEFIIITIOI  TO  I  (TO  ?0)) 

(DEFIIITIOI  RAI  V  (GO  ?0  ?!))) 

(AID  (OR  (DEFIIITIOI  RAI  HAVE  SENAITICLESS) 
(DEFIIITIOI  RAI  TO  SENAITICLESS) 
(DEFIIITIOI  RAI  NODAL  SENAITICLESS) 
(DEFIIITIOI  RAI  BE  SENAITICLESS) 
(DEFIIITIOI  RAI  DO  SENAITICLESS)) 
(DEFIIITIOI  NARY  I  PERS0I2) 

(DEFIIITIOI  TO  V  (GO  ?0  (TO  ?!)))))) 

(AID  (DEFIIITIOI  BILL  I  PERS0I3) 

(OR  (AID  (DEFIIITIOI  NARY  I  PERS0I2) 

(DEFIIITIOI  TO  P  (AT  ?0)) 

(DEFIIITIOI  RAI  V  (BE  ?0  ?!))) 

(AID  (DEFIIITIOI  NARY  I  PERS0I2) 

(DEFIIITIOI  TO  V  (AT  ?0)) 

(DEFIIITIOI  RAI  V  (BE  ?0  ?!))) 

(AID  (DEFIIITIOI  TO  DET  SENAITICLESS) 
(DEFIIITIOI  NARY  I  PERS0H2) 

(DEFIIITIOI  RAI  V  (BE  ?0  (AT  ?1)))) 

(AID  (DEFIIITIOI  NARY  I  PERS0R2) 

(DEFIIITIOI  TO  I  (AT  ?0)) 

(DEFIIITIOI  RAI  V  (BE  ?0  ?!))) 

(AID  (OR  (DEFIIITIOI  RAI  HAVE  SENAITICLESS) 
(DEFIIITIOI  RAI  TO  SENAITICLESS) 
(DEFIIITIOI  RAI  NODAL  SENAITICLESS) 
(DEFIIITIOI  RAI  BE  SENAITICLESS) 
(DEFIIITIOI  RAI  DO  SENAITICLESS)) 
(DEFIIITIOI  NARY  I  PERS0I2) 

(DEFIIITIOI  TO  V  (BE  ?0  (AT  ?!))))))) 


Ics:  (OR  (BE  PERS0H3  (AT  PERSOHl)) 

(GO  PERSOI3  (PATH  (FROM  PERSOil)  (TO  PERS0I2))) 

(GO  PERSOH3  (FROM  PERSOHl)) 

(GO  PERSOI3  (TO  PERS0I2)) 

(GO  PERS0H3  (PATH)) 

(BE  PERSOI3  (AT  PERS0H2))) 
sentence:  (BILL  RAH  FROM  JOHH) 

parse:  (OR  (S  (HP  (H  BILL)  (VP  (V  RAH)))  (VP  (V  FROM)  (HP  (H  JOHH)))) 
(S  (HP  (H  BILL)) 

(OR  (VP  (V  RAH)  (PP  (P  FROM))  (HP  (H  JOHH))) 

(VP  (V  RAH)  (VP  (V  FROM))  (HP  (H  JOHH))) 

(VP  (V  RAH)  (HP  (H  FROM))  (HP  (H  JOHH))) 

(VP  (OR  (AUX  (DO  RAH)) 

(AUX  (BE  RAH)) 

(AUX  (MODAL  RAH)) 

(AUX  (TO  RAH)) 

(AUX  (HAVE  RAH))) 

(V  FROM) 

(HP  (H  JOHH))) 

(VP  (V  RAH) 

(OR  (HP  (DET  FROM)  (H  JOHH)) 

(HP  (H  FROM)  (HP  (H  JOHH))))) 

(VP  (V  RAH)  (VP  (V  FROM)  (HP  (H  JOHH)))) 

(VP  (V  RAH)  (PP  (P  FROM)  (HP  (H  JOHH))))))) 
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APPESDIX  A  MA/A/KA  /.V  OPEHATIOS 


fracture: 

(OR  (AID  (DEFIIITIOI  BILL  I  PERSOI3) 

(OR  (AID  (DEFIIITIOI  JOHI  I  PERSOIl) 

(DEFIIITIOI  FROM  P  (AT  ?0)) 

(DEFIIITIOI  RAM  V  (BE  ?0  ?!))) 

(AID  (DEFIIITIOI  JOHI  I  PERSOIl) 

(DEFIIITIOI  FROM  V  (AT  ?0)) 

(DEFIIITIOI  RAM  V  (BE  ?0  ?i))) 

(AID  (DEFIIITIOI  FROM  DET  SEMAITICLESS) 

(DEFIIITIOI  JOHI  I  PERSOIl) 

(DEFIIITIOI  RAM  V  (BE  ?0  (AT  ?!)))) 

(AID  (DEFIIITIOI  JOHI  I  PERSOIl) 

(DEFIIITIOI  FROM  I  (AT  ?0)) 

(DEFIIITIOI  RAM  V  (BE  ?0  ?!))) 

(AID  (OR  (DEFIIITIOI  RAI  HAVE  SEMAITICLESS) 

(DEFIIITIOI  RAI  TO  SEMAITICLESS) 

(DEFIIITIOI  RAN  MODAL  SEMAITICLESS) 

(DEFIIITIOI  RAI  BE  SEMAITICLESS) 

(DEFIIITIOI  RAH  DO  SEMAITICLESS)) 

(DEFIIITIOI  JOHI  I  PERSOIl) 

(DEFIIITIOI  FROM  V  (BE  ?0  (AT  ?!)))))) 

(AID  (DEFIIITIOI  BILL  I  PERSOI3) 

(OR  (AID  (DEFIIITIOI  JOHI  I  PERSOIl) 

(DEFIIITIOI  FROM  P  (PATH  (FROM  ?0)  (TO  PERSOH2))) 
(DEFIIITIOI  RAI  V  (GO  ?0  ?!))) 

(AID  (DEFIIITIOI  JOHI  I  PERSOIl) 

(DEFIIITIOI  FROM  V  (PATH  (FROM  ?0)  (TO  PERS0I2))) 
(DEFIIITIOI  RAI  V  (GO  ?0  ?!))) 

(AID  (DEFIIITIOI  JOHI  I  PERSOIl) 

(DEFIIITIOI  FROM  I  (PATH  (FROM  ?0)  (TO  PERS0H2))) 
(DEFIIITIOI  RAI  V  (GO  ?0  ?!))) 

(AID  (OR  (DEFIIITIOI  RAM  HAVE  SEMAITICLESS) 

(DEFIIITIOI  RAI  TO  SEMAITICLESS) 

(DEFIIITIOI  RAI  MODAL  SEMAITICLESS) 

(DEFIIITIOI  RAI  BE  SEMAITICLESS) 

(DEFIIITIOI  RAM  DO  SEMAITICLESS)) 

(DEFIIITIOI  JOHI  I  PERSOIl) 

(DEFIIITIOI  FROM  V  (GO  ?0  (PATH  (FROM  ?1)  (TO  PERS0H2))))) 
(AID  (DEFIIITIOI  FROM  I  PERS0H2) 

(DEFIIITIOI  JOHI  I  PERSOIl) 

(DEFIIITIOI  RAI  V  (GO  ?0  (PATH  (FROM  ?1)  (TO  ?2))))) 

(AID  (DEFIIITIOI  FROM  V  PERS0I2) 

(DEFIIITIOI  JOHI  I  PERSOIl) 

(DEFIIITIOI  RAI  V  (GO  ?0  (PATH  (FROM  ?1)  (TO  ?2))))) 

(AID  (DEFIIITIOI  FROM  P  PERS0I2) 

(DEFIIITIOI  JOHI  I  PERSOIl) 

(DEFIIITIOI  RAI  V  (GO  ?0  (PATH  (FROM  ?1)  (TO  ?2))))))) 
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(AID  (DEFIHITIOH  BILL  N  PERS0I3) 

(OR  (AHD  (OEFIRITIOI  JOHI  I  PERSOMl) 

(DEFIHITIOH  FROM  P  (FROM  ?0)) 

(DEFIHITIOH  RAM  V  (GO  ?0  ?!))) 

(AMD  (DEFIHITIOH  JOHH  H  PERSOMl) 

(DEFIHITIOH  FROM  V  (FROM  ?0)) 

(DEFIHITIOH  RAH  V  (GO  ?0  ?1))) 

(AMD  (DEFIHITIOH  JOHH  H  PERSOMl) 

(DEFIHITIOH  FROM  H  (FROM  ?0)) 

(DEFIHITIOH  RAH  V  (GO  ?0  ?!))) 

(AMD  (OR  (DEFIHITIOH  RAH  HAVE  SEMAHTICLESS) 
(DEFIHITIOH  RAH  TO  SEMAHTICLESS) 
(DEFIHITIOH  RAH  MODAL  SEMAHTICLESS) 
(DEFIHITIOH  RAH  BE  SEMAHTICLESS) 
(DEFIHITIOH  RAH  DO  SEMAHTICLESS)) 
(DEFIHITIOH  JOHH  H  PERSOMl) 

(DEFIHITIOH  FROM  V  (GO  ?0  (FROM  ?!))))))) 


APPESDIX  A  MAIM  HA  IS  OPEHATIOS 


•J16 


Ics:  (OR  (BE  PERS0M3  (AT  PERSOIl)) 

(GO  PERSOH3  (PATH  (FROM  PERSOMl)  (TO  OBJECTl))) 
(GO  PERS0H3  (FROM  PERSOMl)) 

(GO  PERSOM3  (TO  OBJECTl)) 

(GO  PERS0M3  (PATH)) 

(BE  PERSOH3  (AT  OBJECTl))) 
sentence:  (BILL  RAH  TO  THE  CUP) 
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parse:  (OR  (S  (NP  (H  BILL)  (VP  (V  RAH))) 

(OR  (VP  (V  TO)  (HP  (H  THE))  (HP  (B  CUP))) 

(VP  (V  TO) 

(OR  (NP  (DET  THE)  (N  CUP)) 

(HP  (H  THE)  (NP  (N  CUP))))))) 

(S  (HP  (H  BILL)) 

(OR  (VP  (V  RAH)  (PP  (P  TO)  (HP  (N  THE)))  (HP  (H  CUP))) 
(VP  (V  RAH)  (VP  (V  TO)  (HP  (N  THE)))  (HP  (H  CUP))) 
(VP  (V  RAH) 

(OR  (HP  (DET  TO)  (N  THE)) 

(NP  (H  TO)  (HP  (H  THE)))) 

(HP  (H  CUP))) 

(VP  (OR  (AUX  (DO  RAN)) 

(AUX  (BE  RAH)) 

(AUX  (MODAL  RAN)) 

(AUX  (TO  RAH)) 

(AUX  (HAVE  RAH))) 

(V  TO) 

(HP  (H  THE)) 

(HP  (H  CUP))) 

(VP  (V  RAH)  (HP  (H  TO))  (HP  (H  THE))  (HP  (N  CUP))) 

(VP  (V  RAH)  (VP  (V  TO))  (HP  (H  THE))  (HP  (H  CUP))) 

(VP  (V  RAH)  (PP  (P  TO))  (HP  (N  THE))  (HP  (H  CUP))) 

(VP  (V  RAH)  (PP  (P  TO)) 

(OR  (HP  (DET  THE)  (H  CUP)) 

(HP  (N  THE)  (HP  (H  Cl-P))))) 

(VP  (V  RAH)  (VP  (V  TO)) 

(OR  (HP  (DET  THE)  (H  CUP)) 

(HP  (H  THE)  (HP  (N  CUP))))) 

(VP  (V  RAH)  (HP  (H  TO)) 

(OR  (HP  (DET  THE)  (H  CUP)) 

(HP  (H  THE)  (HP  (H  CUP))))) 

(VP  (OR  (AUX  (DO  RAH)) 

(AUX  (BE  RAH)) 

(AUX  (MODAL  RAH)) 

(AUX  (TO  RAH)) 

(AUX  (HAVE  RAH))) 

(V  TO) 

(OR  (HP  (DET  THE)  (H  CUP)) 

(HP  (H  THE)  (HP  (N  CUP))))) 
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AFrh:SI)l\  A  MMMHA  IS  OPt'.HA  IlOS 


(VP  (V  RAM) 

(OR  (HP  (M  TO)  (HP  (M  THE))  (HP  (M  CUP))) 

(HP  (DET  TO)  (H  THE)  (HP  (H  CUP))) 

(HP  (H  TO) 

(OR  (HP  (DET  THE)  (M  CUP)) 

(HP  (H  THE)  (HP  (M  CUP))))))) 

(VP  (V  RAH) 

(OR  (VP  (V  TO)  (HP  (H  THE))  (HP  (H  CUP))) 

(VP  (V  TO) 

(OR  (HP  (DET  THE)  (H  CUP)) 

(HP  (H  THE)  (HP  (H  CUP))))))) 

(VP  (V  RAM) 

(OR  (PP  (P  TO)  (HP  (H  THE))  (HP  (H  CUP))) 

(PP  (P  TO) 

(OR  (HP  (DET  THE)  (H  CUP)) 

(HP  (H  THE)  (HP  (H  CUP)))))))))) 


219 


fracture; 

(OR  (AMD  (DEFINITION  BILL  N  PERSONS) 

(OR  (AND  (DEFINITION  THE  DET  SEMANTICLESS ) 

(DEFINITION  CUP  N  OBJECTl) 

(DEFINITION  TO  P  (PATH  (FROM  PERSONl)  (TO  ?0))) 
(DEFINITION  RAN  V  (GO  ?0  ?!))) 

(AND  (DEFINITION  THE  DET  SEMANTICLESS) 

(DEFINITION  CUP  N  OBJECTl) 

(DEFINITION  TO  V  (PATH  (FROM  PERSONl)  (TO  ?0))) 
(DEFINITION  RAN  V  (GO  ?0  ?!))) 

(AND  (DEFINITION  THE  DET  SEMANTICLESS) 

(DEFINITION  CUP  N  OBJECTl) 

(DEFINITION  TO  N  (PATH  (FROM  PERSONl)  (TO  ?0))) 
(DEFINITION  RAN  V  (GO  ?0  ?!))) 

(AND  (OR  (DEFINITION  RAN  HAVE  SEMANTICLESS) 

(DEFINITION  RAN  TO  SEMANTICLESS) 

(DEFINITION  RAN  MODAL  SEMANTICLESS) 

(DEFINITION  RAN  BE  SEMANTICLESS) 

(DEFINITION  RAN  DO  SEMANTICLESS)) 

(DEFINITION  THE  DET  SEMANTICLESS) 

(DEFINITION  CUP  N  OBJECTl) 

(DEFINITION  TO  V  (GO  ?0  (PATH  (FROM  PERSONl)  (TO  ?!))))) 
(AND  (DEFINITION  TO  N  PERSONl) 

(DEFINITION  THE  DET  SEMANTICLESS) 

(DEFINITION  CUP  N  OBJECTl) 

(DEFINITION  RAN  V  (GO  ?0  (PATH  (FROM  ?1)  (TO  ?2))))) 

(AND  (DEFINITION  TO  V  PERSONl) 

(DEFINITION  THE  DET  SEMANTICLESS) 

(DEFINITION  CUP  N  OBJECTl) 

(DEFINITION  RAN  V  (GO  ?0  (PATH  (FROM  ?1)  (TO  ?2))))) 

(AND  (DEFINITION  TO  P  PERSONl) 

(DEFINITION  THE  DET  SEMANTICLESS) 

(DEFINITION  CUP  N  OBJECTl) 

(DEFINITION  RAN  V  (GO  ?0  (PATH  (FROM  ?1)  (TO  ?2))))))) 
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APPESDIX  A  MAIMRA  IX  OPEUATIOX 


(AID  (DEFINITIOM  BILL  I  PERS0M3) 

(OR  (AID  (DEFIIITIOI  THE  DET  SEMAITICLESS) 
(DEFIIITIOI  CUP  I  OBJECTl) 

(DEFIIITIOI  TO  P  (TO  ?0)) 

(DEFIIITIOI  RAI  V  (GO  ?0  ?!))) 

(AID  (DEFIIITIOI  THE  DET  SEMAITICLESS) 
(DEFIIITIOI  CUP  I  OBJECTl) 

(DEFIIITIOI  TO  V  (TO  ?0)) 

(DEFIIITIOI  RAI  V  (GO  ?0  ?1))) 

(AID  (DEFIIITIOI  THE  DET  SEMAITICLESS) 
(DEFIIITIOI  CUP  I  OBJECTl) 

(DEFIIITIOI  TO  I  (TO  ?0)) 

(DEFIIITIOI  RAI  V  (GO  ?0  ?!))) 

(AID  (OR  (DEFIIITIOI  RAI  HAVE  SEMAITICLESS) 
(DEFIIITIOI  RAI  TO  SEMAITICLESS) 
(DEFIIITIOI  RAI  MODAL  SEMAITICLESS) 
(DEFIIITIOI  RAI  BE  SEMAITICLESS) 
(DEFIIITIOI  RAI  DO  SEMAITICLESS)) 
(DEFIIITIOI  THE  DET  SEMAITICLESS) 
(DEFIIITIOI  CUP  I  OBJECTl) 

(DEFIIITIOI  TO  V  (GO  ?0  (TO  ?!)))))) 

(AID  (DEFIIITIOI  BILL  I  PERS0I3) 

(OR  (AID  (DEFIIITIOI  THE  DET  SEMAITICLESS) 
(DEFIIITIOI  CUP  I  OBJECTl) 

(DEFIIITIOI  TO  P  (AT  ?0)) 

(DEFIIITIOI  RAI  V  (BE  ?0  ?!))) 

(AID  (DEFIIITIOI  THE  DET  SEMAITICLESS) 
(DEFIIITIOI  CUP  I  OBJECTl) 

(DEFIIITIOI  TO  V  (AT  ?0)) 

(DEFIIITIOI  RAI  V  (BE  ?0  ?!))) 

(AID  (DEFIIITIOI  THE  DET  SEMAITICLESS) 
(DEFIIITIOI  CUP  I  OBJECTl) 

(DEFIIITIOI  TO  I  (AT  ?0)) 

(DEFIIITIOI  RAI  V  (BE  ?0  ?!))) 

(AID  (OR  (DEFIIITIOI  RAI  HAVE  SEMAITICLESS) 
(DEFIIITIOI  RAI  TO  SEMAITICLESS) 
(DEFIIITIOI  RAI  MODAL  SEMAITICLESS) 
(DEFIIITIOI  RAI  BE  SEMAITICLESS) 
(DEFIIITIOI  RAI  DO  SEMAITICLESS)) 
(DEFIIITIOI  THE  DET  SEMAITICLESS) 
(DEFIIITIOI  CUP  I  OBJECTl) 

(DEFIIITIOI  TO  V  (BE  ?0  (AT  ?!))))))) 
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Ics;  (OR  (BE  OBJECTl  (AT  PERSOHl)) 

(GO  OBJECTl  (PATH  (FROM  PERSOMl)  (TO  PERS0M2))) 

(GO  OBJECTl  (FROM  PERSOHl)) 

(GO  OBJECTl  (TO  PERS0H2)) 

(GO  OBJECTl  (PATH)) 

(BE  OBJECTl  (AT  PERS0H2))) 
sentence:  (THE  CUP  SLID  FROM  JOHH  TO  MARY) 


22-2 


APFESDIX  A  MAIMRA  IS  OPEHATIOS 


petrse: 

(OR  (S  (OR  (MP  (DET  THE) 

(M  CUP) 

(SEAR  (S  (HP  (H  SLID))  (VP  (V  FROM)  (IP  (I  JOHI)))))) 

(MP  (DET  THE) 

(H  CUP) 

(OR  (PP  (P  SLID)  (HP  (H  FROM))) 

(PP  (P  SLID)  (VP  (V  FROM))) 

(PP  (P  SLID)  (PP  (P  FROM)))) 

(HP  (H  JOHH))) 

(HP  (DET  THE)  (H  CUP)  (IP  (H  SLID))  (PP  (P  FROM))  (HP  (H  JOHH))) 

(HP  (DET  THE)  (H  CUP)  (VP  (V  SLID))  (PP  (P  FROM))  (IP  (H  JOHH))) 

(HP  (DET  THE)  (H  CUP)  (PP  (P  SLID))  (PP  (P  FROM))  (HP  (H  JOHH))) 

(HP  (DET  THE)  (H  CUP) 

(OR  (VP  (OR  (AUX  (DO  SLID)) 

(AUX  (BE  SLID)) 

(AUX  (MODAL  SLID)) 

(AUX  (TO  SLID)) 

(AUX  (HAVE  SLID))) 

(V  FROM)) 

(VP  (V  SLID)  (HP  (H  FROM))) 

(VP  (V  SLID)  (VP  (V  FROM))) 

(VP  (V  SLID)  (PP  (P  FROM)))) 

(HP  (H  JOHH))) 

(HP  (DET  THE)  (H  CUP)  (HP  (H  SLID))  (VP  (V  FROM))  (HP  (H  JOHN))) 

(HP  (DET  THE)  (H  CUP)  (VP  (V  SLID))  (VP  (V  FROM))  (HP  (H  JOHH))) 

(HP  (DET  THE)  (H  CUP)  (PP  (P  SLID))  (VP  (V  FROM))  (HP  (H  JOHH))) 

(IP  (DET  THE)  (H  CUP) 

(OR  (HP  (DET  SLID)  (H  FROM)) 

(HP  (H  SLID)  (HP  (H  FROM))) 

(HP  (H  SLID)  (VP  (V  FROM))) 

(HP  (I  SLID)  (PP  (P  FROM)))) 

(HP  (I  JOHH))) 

(HP  (DET  THE)  (H  CUP)  (IP  (H  SLID))  (HP  (H  FROM))  (HP  (H  JOHH))) 

(HP  (DET  THE)  (H  CUP)  (VP  (V  SLID))  (HP  (H  FROM))  (HP  (H  JOHH))) 

(HP  (DET  THE)  (H  CUP)  (PP  (P  SLID))  (HP  (H  FROM))  (HP  (H  JOHH))) 

(HP  (DET  THE) 

(H  CUP) 

(SEAR  (S  (HP  (H  SLID))  (VP  Cv  FROM)))) 

(HP  (H  JOHH))) 

(HP  (DET  THE)  (H  CUP)  (PP  (P  SLL.  >)  (HP  (H  FROM)  (HP  (H  JOHH)))) 

(HP  (DET  THE)  (H  CUP)  (VP  (V  SLID))  (HP  (H  FROM)  (HP  (H  JOHH)))) 

(HP  (DET  THE)  (H  CUP)  (HP  (H  SLID))  (HP  (H  FROM)  (HP  (H  JOHH)))) 
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(IP  (DET  THE) 

(I  CUP) 

(OR  (IP  (I  SLID)  (PP  (P  FROM))  (IP  (I  JOHM))) 

(HP  (I  SLID)  (VP  (V  FROM))  (HP  (H  JOHH))) 

(HP  (H  SLID)  (HP  (H  FROM))  (HP  (H  JOHH))) 

(HP  (DET  SLID)  (H  FROM)  (HP  (H  JOHH))) 

(HP  (H  SLID)  (HP  (H  FROM)  (HP  (H  JOHH)))) 

(HP  (H  SLID)  (VP  (V  FROM)  (HP  (H  JOHH)))) 

(HP  (H  SLID)  (PP  (P  FROM)  (HP  (H  JOHH)))))) 

(HP  (DET  THE)  (H  CUP)  (PP  (P  SLID))  (VP  (V  FROM)  (HP  (H  JOHH)))) 

(HP  (DET  THE)  (H  CUP)  (VP  (V  SLID))  (VP  (V  FROM)  (HP  (H  JOHH)))) 

(HP  (DET  THE)  (H  CUP)  (HP  (H  SLID))  (VP  (V  FROM)  (HP  (H  JOHH)))) 

(HP  (DET  THE) 

(H  CUP) 

(OR  (VP  (V  SLID)  (PP  (P  FROM))  (HP  (H  JOHH))) 

(VP  (V  SLID)  (VP  (V  FROM))  (HP  (H  JOHH))) 

(VP  (V  SLID)  (HP  (H  FROM))  (HP  (H  JOHH))) 

(VP  (OR  (AUX  (DO  SLID)) 

(AUX  (BE  SLID)) 

(AUX  (MODAL  SLID)) 

(AUX  (TO  SLID)) 

(AUX  (HAVE  SLID))) 

(V  FROM) 

(HP  (H  JOHH))) 

(VP  (V  SLID)  (HP  (H  FROM)  (HP  (H  JOHH)))) 

(VP  (V  SLID)  (VP  (V  FROM)  (HP  (H  JOHH)))) 

(VP  (V  SLID)  (PP  (P  FROM)  (HP  (N  JOHH)))))) 

(HP  (DET  THE)  (H  CUP)  (PP  (P  SLID))  (PP  (P  FROM)  (IP  (H  JOHH)))) 

(HP  (DET  THE)  (H  CUP)  (VP  (V  SLID))  (PP  (P  FROM)  (HP  (H  JOHH)))) 

(HP  (DET  THE)  (H  CUP)  (HP  (H  SLID))  (PP  (P  FROM)  (HP  (H  JOHH)))) 

(HP  (DET  THE) 

(H  CUP) 

(OR  (PP  (P  SLID)  (PP  (P  FROM))  (HP  (H  JOHH))) 

(PP  (P  SLID)  (VP  (V  FROM))  (HP  (H  JOHH))) 

(PP  (P  SLID)  (HP  (H  FROM))  (HP  (H  JOHH))) 

(PP  (P  SLID)  (HP  (H  FROM)  (HP  (H  JOHH)))) 

(PP  (P  SLID)  (VP  (V  FROM)  (HP  (H  JOHH)))) 

(PP  (P  SLID)  (PP  (P  FROM)  (HP  (H  JOHH))))))) 

(VP  (V  TO)  (HP  (H  MARY)))) 
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(S  (OR  (IP  (DET  THE)  (I  CUP)  (IP  (I  SLID))) 

(IP  (DET  THE)  (I  CUP)  (VP  (V  SLID))) 

(IP  (DET  THE)  (I  CUP)  (PP  (P  SLID)))) 

(OR  (VP  (V  FROM)  (SEAR  (S  (IP  (I  JOHI))  (VP  (V  TO)  (IP  (I  MARY)))))) 
(VP  (V  FROM)  (IP  (I  JOHI))  (PP  (P  TO))  (IP  (I  MARY))) 

(VP  (V  FROM)  (IP  (I  JOHI))  (VP  (V  TO))  (IP  (I  MARY))) 

(VP  (V  FROM) 

(OR  (IP  (I  JOHI)  (IF  (I  TO))) 

(IP  (I  JOHI)  (VP  (V  TO))) 

(IP  (I  JOHI)  (PP  (P  TO)))) 

(IP  (I  MARY))) 

(VP  (V  FROM)  (IP  (I  JOHI))  (IP  (I  TO))  (IP  (I  MARY))) 

(VP  (V  FROM)  (SEAR  (S  (IP  (I  JOHI))  (VP  (V  TO))))  (IP  (I  MARY))) 

(VP  (V  FROM)  (IP  (I  JOHI))  (IP  (I  TO)  (IP  (I  MARY)))) 

(VP  (V  FROM) 

(OR  (IP  (I  JOHI)  (PP  (P  TO))  (IP  (I  MARY))) 

(IP  (I  JOHI)  (VP  (V  TO))  (IP  (I  MARY))) 

(IP  (I  JOHI)  (IP  (I  TO))  (IP  (I  MARY))) 

(IP  (I  JOHI)  (IP  (I  TO)  (IP  (I  MARY)))) 

(IP  (I  JOHI)  (VP  (V  TO)  (IP  (I  MARY)))) 

(IP  (I  JOHI)  (PP  (P  TO)  (HP  (I  MARY)))))) 

(VP  (V  FROM)  (IP  (I  JOHI))  (VP  (V  TO)  (IP  (I  MARY)))) 

(VP  (V  FROM)  (IP  (I  JOHI))  (PP  (P  TO)  (HP  (I  MARY)))))) 

(S  (IP  (DET  THE)  (I  CUP)) 

(OR  (VP  (V  SLID) 

(PP  (P  FROM)) 

(SEAR  (S  (IP  (I  JOHI))  (VP  (V  TO)  (IP  (I  MARY)))))) 

(VP  (V  SLID) 

(VP  (V  FROM)) 

(SEAR  (S  (IP  (I  JOHI))  (VP  (V  TO)  (IP  (I  MARY)))))) 

(VP  (V  SLID) 

(IP  (I  FROM)) 

(SEAR  (S  (IP  (I  JOHI))  (VP  (V  TO)  (IP  (I  MARY)))))) 

(VP  (OR  (AUX  (DO  SLID)) 

(AUX  (EE  SLID)) 

(AUX  (MODAL  SLID)) 

(AUX  (TO  SLID)) 

(AUX  (HAVE  SLID))) 

(V  FROM) 

(SEAR  (S  (IP  (I  JOHI))  (VP  (V  TO)  (HP  (I  MARY)))))) 

(VP  (V  SLID) 

(SEAR  (S  (HP  (I  FROM)  (IP  (I  JOHI))) 

(VP  (V  TO)  (HP  (I  MARY)))))) 


(VP 


(VP 

(VP 

(VP 

(VP 


(VP 

(VP 

(VP 

(VP 


(VP 

(VP 

(VP 


(V  SLID) 

(OR  (PP  (P  FROM)  (SEAR  (S  (IP  (M  JOHM))  (VP  (V  TO))))) 
(PP  (P  FROM)  (IP  (I  JOHM))  (IP  (I  TO))) 

(PP  (P  FROM) 

(OR  (IP  (I  JOHI)  (HP  (I  TO))) 

(IP  (I  JOHI)  (VP  (V  TO))) 

(IP  (I  JOHI)  (PP  (P  TO))))) 

(PP  (P  FROM)  (IP  (I  JOHI))  (VP  (V  TO))) 

(PP  (P  FROM)  (IP  (I  JOHI))  (PP  (P  TO)))) 

(HP  (I  MARY))) 

(V  SLID)  (PP  (P  FROM))  (IP  (I  JOHI))  (PP  (P  TO))  (IP  (I 

(V  SLID)  (VP  (V  FROM))  (IP  (I  JOHI))  (PP  (P  TO))  (IP  (I 

(V  SLID)  (HP  (I  FROM))  (IP  (I  JOHI))  (PP  (P  TO))  (IP  (I 

(OR  (AUX  (DO  SLID)) 

(AUX  (BE  SLID)) 

(AUX  (MODAL  SLID)) 

(AUX  (TO  SLID)) 

(AUX  (HAVE  SLID))) 

(V  FROM) 

(IP  (I  JOHI)) 

(PP  (P  TO)) 

(HP  (I  MARY))) 

(V  SLID)  (IP  (I  FROM)  (HP  (I  JOHI)))  (PP  (P 

(V  SLID)  (VP  (V  FROM)  (HP  (I  JOHI)))  (PP  (P 

(V  SLID)  (PP  (P  FROM)  (IP  (I  JOHI)))  (PP  (P 

(V  SLID) 

(OR  (VP  (V  FROM)  (SEAR  (S  (HP  (I  JOHI))  (VP  (V  TO))))) 
(VP  (V  FROM)  (HP  (I  JOHI))  (HP  (I  TO))) 

(VP  (V  FROM) 

(OR  (HP  (I  JOHI)  (IP  (I  TO))) 

(IP  (I  JOHI)  (VP  (V  TO))) 

(HP  (I  JOHI)  (PP  (P  TO))))) 

(VP  (V  FROM)  (HP  (I  JOHI))  (VP  (V  TO))) 

(VP  (V  FROM)  (HP  (I  JOHI))  (PP  (P  TO)))) 

(HP  (I  MARY))) 

(V  SLID)  (PP  (P  FROM))  (IP  (I  JOHI))  (VP  (V  TO))  (HP  (I 

(V  SLID)  (VP  (V  FROM))  (IP  (I  JOHI))  (VP  (V  TO))  (IP  (H 

(V  SLID)  (IP  (I  FROM))  (IP  (I  JOHI))  (VP  (V  TO))  (IP  (I 


TO))  (HP  (I 
TO))  (HP  (I 
TO))  (IP  (I 


MARY))) 

MARY))) 

MARY))) 


MARY))) 

MARY))) 

MARY))) 


MARY))) 

MARY))) 

MARY))) 


APPESDIX  A.  MAIM R A  I\  OPERATIOX 


(VP  (OR  (AUX  (DO  SLID)) 

(AUX  (BE  SLID)) 

(AUX  (NODAL  SLID)) 

(AUX  (TO  SLID)) 

(AUX  (HAVE  SLID))) 

(V  FROM) 

(HP  (I  JOHN)) 

(VP  (V  TO)) 

(HP  (H  MARY))) 

(VP  (V  SLID)  (HP  (H  FROM)  (HP  (H  JOHH)))  (VP  (V  TO))  (HP  (H  MARY))) 

(VP  (V  SLID)  (VP  (V  FROM)  (HP  (H  JOHH)))  (VP  (V  TO))  (HP  (H  MARY))) 

(VP  (V  SLID)  (PP  (P  FROM)  (HP  (H  JOHH)))  (VP  (V  TO))  (HP  (H  MARY))) 

(VP  (V  SLID) 

(OR  (HP  (H  FROM)  (SBAR  (S  (HP  (H  JOHH))  (VP  (V  TO))))) 

(HP  (H  FROM)  (HP  (H  JOHH))  (HP  (H  TO))) 

(HP  (H  FROM) 

(OR  (HP  (H  JOHH)  (HP  (H  TO))) 

(HP  (H  JOHH)  (VP  (V  TO))) 

(HP  (H  JOHH)  (PP  (P  TO))))) 

(HP  (H  FROM)  (HP  (H  JOHH))  (VP  (V  TO))) 

(HP  (H  FROM)  (HP  (H  JOHH))  (PP  (P  TO)))) 

(HP  (H  MARY))) 

(VP  (OR  (AUX  (DO  SLID)) 

(AUX  (BE  SLID)) 

(AUX  (MODAL  SLID)) 

(AUX  (TO  SLID)) 

(AUX  (HAVE  SLID))) 

(V  FROM) 

(OR  (HP  (H  JOHH)  (HP  (H  TO))) 

(HP  (H  JOHH)  (VP  (V  TO))) 

(HP  (H  JOHH)  (PP  (P  TO)))) 

(HP  (H  MARY))) 

(VP  (V  SLID) 

(HP  (H  FROM)) 

(OR  (HP  (H  JOHH)  (HP  (H  TO))) 

(HP  (H  JOHH)  (VP  (V  TO))) 

(HP  (H  JOHH)  (PP  (P  TO)))) 

(HP  (H  MARY))) 

(VP  (V  SLID) 

(VP  (V  FROM)) 

(OR  (HP  (H  JOHH)  (HP  (H  TO))) 

(HP  (H  JOHH)  (VP  (V  TO))) 

(HP  (H  JOHH)  (PP  (P  TO)))) 

(HP  (H  MARY))) 
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(VP  (V  SLID) 

(PP  (P  FROM)) 

(OR  (HP  (H  JOHI)  (HP  (I  TO))) 

(HP  (H  JOHH)  (VP  (V  TO))) 

(HP  (H  JOHH)  (PP  (P  TO)))) 

(HP  (H  MARY))) 

(VP  (V  SLID)  (PP  (P  FROM))  (HP  (H  JOHH))  (HP  (H  TO))  (HP  (H  MARY))) 

(VP  (V  SLID)  (VP  (V  FROM))  (HP  (H  JOHH))  (HP  (H  TO))  (HP  (H  MARY))) 

(VP  (V  SLID)  (HP  (H  FROM))  (HP  (H  JOHH))  (HP  (H  TO))  (HP  (H  MARY))) 

(VP  (OR  (AUX  (DO  SLID)) 

(AUX  (BE  SLID)) 

(AUX  (MODAL  SLID)) 

(AUX  (TO  SLID)) 

(AUX  (HAVE  SLID))) 

(V  FROM) 

(HP  (H  JOHH)) 

(HP  (H  TO)) 

(HP  (H  MARY))) 

(VP  (V  SLID)  (HP  (H  FROM)  (HP  (H  JOHH)))  (HP  (H  TO))  (HP  (H  MARY))) 

(VP  (V  SLID)  (VP  (V  FROM)  (HP  (H  JOHH)))  (HP  (H  TO))  (HP  (H  MARY))) 

(VP  (V  SLID)  (PP  (P  FROM)  (HP  (H  JOHH)))  (HP  (H  TO))  (HP  (H  MARY))) 

(VP  (V  SLID) 

(SBAR  (S  (HP  (H  FROM)  (HP  (H  JOHH)))  (VP  (V  TO)))) 

(HP  (H  MARY))) 

(VP  (OR  (AUX  (DO  SLID)) 

(AUX  (BE  SLID)) 

(AUX  (MODAL  SLID)) 

(AUX  (TO  SLID)) 

(AUX  (HAVE  SLID))) 

(V  FROM) 

(SBAR  (S  (HP  (H  JOHH))  (VP  (V  TO)))) 

(HP  (H  MARY))) 

(VP  (V  SLID) 

(HP  (H  FROM)) 

(SBAR  (S  (HP  (H  JOHH))  (VP  (V  TO)))) 

(HP  (H  MARY))) 

(VP  (V  SLID) 

(VP  (V  FROM)) 

(SBAR  (S  (HP  (H  JOHH))  (VP  (V  TO)))) 

(HP  (H  MARY))) 

(VP  (V  SLID) 

(PP  (P  FROM)) 

(SBAR  (S  (HP  (H  JOHH))  (VP  (V  TO)))) 

(HP  (H  MARY))) 
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(VP  (V  SLID)  (PP  (P  FROM)  (IP  (I  JOHM)))  (IP  (I  TO)  (IP  (I  MARY)))) 

(VP  (V  SLID)  (VP  (V  FROM)  (IP  (I  JOHI)))  (IP  (I  TO)  (IP  (I  MARY)))) 

(VP  (V  SLID)  (IP  (I  FROM)  (IP  (I  JOHI)))  (IP  (I  TO)  (IP  (I  MARY)))) 

(VP  (OR  (AUX  (DO  SLID)) 

(AUX  (BE  SLID)) 

(AUX  (MODAL  SLID)) 

(AUX  (TO  SLID)) 

(AUX  (HAVE  SLID))) 

(V  FROM)  (HP  (I  JOHI))  (IP  (I  TO)  (IP  (I  MARY)))) 

(VP  (V  SLID)  (IP  (I  FROM))  (IP  (I  JOHI))  (IP  (I  TO)  (IP  (I  MARY)))) 

(VP  (V  SLID)  (VP  (V  FROM))  (IP  (I  JOHI))  (IP  (I  TO)  (IP  (I  MARY)))) 

(VP  (V  SLID)  (PP  (P  FROM))  (IP  (I  JOHI))  (IP  (I  TO)  (IP  (I  MARY)))) 

(VP  (V  SLID) 

(PP  (P  FROM)) 

(OR  (IP  (I  JOHI)  (PP  (P  TO))  (IP  (I  MARY))) 

(IP  (I  JOHI)  (VP  (V  TO))  (IP  (I  MARY))) 

(IP  (I  JOHI)  (IP  (I  TO))  (IP  (I  MARY))) 

(HP  (I  JOHI)  (IP  (I  TO)  (IP  (I  MARY)))) 

(IP  (I  JOHI)  (VP  (V  TO)  (HP  (I  MARY)))) 

(HP  (I  JOHI)  (PP  (P  TO)  (HP  (I  MARY)))))) 

(VP  (V  SLID) 

(VP  (V  FROM)) 

(OR  (HP  (I  JOHI)  (PP  (P  TO))  (BP  (H  MARY))) 

(HP  (H  JOHH)  (VP  (V  TO))  (BP  (H  MARY))) 

(HP  (H  JOHH)  (HP  (I  TO))  (BP  (I  MARY))) 

(HP  (H  JOHH)  (HP  (H  TO)  (HP  (H  MARY)))) 

(HP  (H  JOHH)  (VP  (V  TO)  (HP  (H  MARY)))) 

(HP  (H  JOHH)  (PP  (P  TO)  (HP  (H  MARY)))))) 

(VP  (V  SLID) 

(HP  (H  FROM)) 

(OR  (HP  (H  JOHH)  (PP  (P  TO))  (HP  (H  MARY))) 

(HP  (H  JOHH)  (VP  (V  TO))  (IP  (H  MARY))) 

(HP  (H  JOHH)  (HP  (H  TO))  (BP  (H  MARY))) 

(HP  (H  JOHH)  (HP  (H  TO)  (HP  (H  MARY)))) 

(IP  (H  JOHI)  (VP  (V  TO)  (HP  (H  MARY)))) 

(HP  (H  JOHH)  (PP  (P  TO)  (HP  (H  MARY)))))) 

(VP  (OR  (AUX  (DO  SLID)) 

(AUX  (BE  SLID)) 

(AUX  (MODAL  SLID)) 

(AUX  (TO  SLID)) 

(AUX  (HAVE  SLID))) 

(V  FROM) 

(OR  (IP  (H  JOHH)  (PP  (P  TO))  (IP  (H  MARY))) 

(HP  (I  JOHI)  (VP  (V  TO))  (IP  (H  MARY))) 

(HP  (H  JOHH)  (HP  (I  TO))  (IP  (I  MARY))) 

(IP  (I  JOHH)  (IP  (H  TO)  (HP  (H  MARY)))) 

(HP  (H  JOHH)  (VP  (V  TO)  (HP  (H  MARY)))) 

(HP  (H  JOHH)  (PP  (P  TO)  (HP  (H  MARY)))))) 
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(VP  (V  SLID) 

(OR  (HP  (H  FROM) 

(SBAR  (S  (HP  (H  JOHH))  (VP  (V  TO)  (HP  (H  MARY)))))) 

(HP  (H  FROM)  (HP  (H  JOHH))  (PP  (P  TO))  (HP  (H  MARY))) 

(HP  (H  FROM)  (HP  (H  JOHH))  (VP  (V  TO))  (HP  (H  MARY))) 

(HP  (H  FROM) 

(OR  (HP  (H  JOHH)  (HP  (H  TO))) 

(HP  (??  JOHH)  (VP  (V  TO))) 

(HP  (H  JOHH)  (PP  (P  TO)))) 

(HP  (H  MARY))) 

(HP  (H  FROM)  (HP  (H  JOHH))  (HP  (H  TO))  (HP  (H  MARY))) 

(HP  (H  FROM) 

(SBAR  (S  (HP  (H  JOHH))  (VP  (V  TO)))) 

(HP  (H  MARY))) 

(HP  (H  FROM)  (HP  (H  JOHH))  (HP  (H  TO)  (HP  (H  MARY)))) 

(HP  (H  FROM) 

(OR  (HP  (H  JOHH)  (PP  (P  TO))  (HP  (H  MARY))) 

(HP  (H  JOHH)  (VP  (V  TO))  (HP  (H  MARY))) 

(HP  (H  JOHH)  (HP  (H  TO))  (HP  (H  MARY))) 

(HP  (H  JOHH)  (HP  (H  TO)  (HP  (H  MARY)))) 

(HP  (H  JOHH)  (VP  (V  TO)  (HP  (H  MARY)))) 

(HP  (H  JOHH)  (PP  (P  TO)  (HP  (H  MARY)))))) 

(HP  (H  FROM)  (HP  (H  JOHH))  (VP  (V  TO)  (HP  (H  MARY)))) 

(HP  (H  FROM)  (HP  (H  JOHH))  (PP  (P  TO)  (HP  (H  MARY)))))) 

(VP  (V  SLID)  (PP  (P  FROM)  (HP  (H  JOHH)))  (VP  (V  TO)  (HP  (H  MARY)))) 

(VP  (V  SLID)  (VP  (V  FROM)  (HP  (H  JOHH)))  (VP  (V  TO)  (HP  (H  MARY)))) 

(VP  (V  SLID)  (HP  (H  FROM)  (HP  (H  JOHH)))  (VP  (V  TO)  (HP  (H  MARY)))) 

(VP  (OR  (AUX  (DO  SLID)) 

(AUX  (BE  SLID)) 

(AUX  (MODAL  SLID)) 

(AUX  (TO  SLID)) 

(AUX  (HAVE  SLID))) 

(V  FROM) 

(HP  (H  JOHH)) 

(VP  (V  TO)  (HP  (H  MARY)))) 

(VP  (V  SLID)  (HP  (H  FROM))  (HP  (H  JOHH))  (VP  (V  TO)  (HP  (H  MARY)))) 

(VP  (V  SLID)  (VP  (V  FROM))  (HP  (H  JOHH))  (VP  (V  TO)  (HP  (H  MARY)))) 

(VP  (V  SLID)  (PP  (P  FROM))  (HP  (H  JOHH))  (VP  (V  TO)  (HP  (H  MARY)))) 
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(VP  (V  SLID) 

(OR  (VP  (V  FROM) 

(SBAR  (S  (IP  (I  JOHD)  (VP  (V  TO)  (HP  (I  MARY))))>) 

(VP  (V  FROM)  (HP  (H  JOHH))  (PP  (P  TO))  (HP  (H  MARY))) 

(VP  (V  FROM)  (HP  (H  JOHH))  (VP  (V  TO))  (HP  (H  MARY))) 

(VP  (V  FROM) 

(OR  (HP  (H  JOHH)  (HP  (H  TO))) 

(HP  (H  JOHH)  (VP  (V  TO))) 

(HP  (H  JOHH)  (PP  (P  TO)))) 

(HP  (H  MARY))) 

(VP  (V  FROM)  (HP  (H  JOHH))  (HP  (H  TO))  (IP  (H  MARY))) 

(VP  (V  FROM) 

(SBAR  (S  (HP  (H  JOHH))  (VP  (V  TO)))) 

(HP  (H  MARY))) 

(VP  (V  FROM)  (HP  (H  JOHH))  (HP  (H  TO)  (HP  (H  MARY)))) 

(VP  (V  FROM) 

(OR  (HP  (H  JOHH)  (PP  (P  TO))  (HP  (H  MARY))) 

(HP  (H  JOHH)  (VP  (V  TO))  (HP  (H  MARY))) 

(HP  (H  JOHH)  (HP  (H  TO))  (HP  (H  MARY))) 

(HP  (I  JOHH)  (HP  (H  TO)  (HP  (H  MARY)))) 

(HP  (H  JOHH)  (VP  (V  TO)  (HP  (H  MARY)))) 

(IP  (H  JOHH)  (PP  (P  TO)  (IP  (H  MARY)))))) 

(VP  (V  FROM)  (HP  (H  JOHH))  (VP  (V  TO)  (HP  (H  MARY)))) 

(VP  (V  FROM)  (HP  (H  JOHH))  (PP  (P  TO)  (HP  (H  MARY)))))) 

(VP  (V  SLID)  (PP  (P  FROM)  (IP  (H  JOHH)))  (PP  (P  TO)  (HP  (H  MARY)))) 

(VP  (V  SLID)  (VP  (V  FROM)  (HP  (H  JOHH)))  (PP  (P  TO)  (HP  (H  MARY)))) 

(VP  (V  SLID)  (IP  (H  FROM)  (HP  (H  JOHH)))  (PP  (P  TO)  (HP  (H  MARY)))) 

(VP  (OR  (AUX  (DO  SLID)) 

(AUX  (BE  SLID)) 

(AUX  (MODAL  SLID)) 

(AUX  (TO  SLID)) 

(AUX  (HAVE  SLID))) 

(V  FROM) 

(HP  (H  JOHH)) 

(PP  (P  TO)  (HP  (H  MARY)))) 

(VP  (V  SLID)  (HP  (H  FROM))  (HP  (H  JOHH))  (PP  (P  TO)  (HP  (H  MARY)))) 

(VP  (V  SLID)  (VP  (V  FROM))  (HP  (H  JOHH))  (PP  (P  TO)  (HP  (H  MARY)))) 

(VP  (V  SLID)  (PP  (P  FROM))  (HP  (H  JOHH))  (PP  (P  TO)  (HP  (H  MARY)))) 


(VP  (V  SLID) 

(Oh  (PP  (P  FROM) 

(SBAR  (S  (HP  (N  JOHM))  (VP  (V  TO)  (MP  (M  MARY)))))) 

(PP  (P  FROM)  (IP  (I  JOHM))  (PP  (P  TO))  (MP  (M  MARY))) 

(PP  (P  FROM)  (MP  (M  JOHM))  (VP  (V  TO))  (IP  (M  MARY))) 

(PP  (P  FROM) 

(OR  (HP  (M  JOHM)  (MP  (M  TO))) 

(MP  (M  JOHM)  (VP  (V  TO))) 

(HP  (M  JOHM)  (PP  (P  TO)))) 

(HP  (M  MARY))) 

(PP  (P  FROM)  (MP  (M  JOHM))  (MP  (M  TO))  (MP  (M  MARY))) 

(PP  (P  FROM) 

(SBAR  (S  (MP  (N  JOHM))  (VP  (V  TO)))) 

(MP  (M  MARY))) 

(PP  (P  FROM)  (MP  (M  JOHM))  (IP  (I  TO)  (MP  (H  MARY)))) 

(PP  (P  FROM) 

(OR  (HP  (M  JOHM)  (PP  (P  TO))  (MP  (H  MARY))) 

(MP  (M  JOHM)  (VP  (V  TO))  (MP  (M  MARY))) 

(MP  (H  JOHM)  (HP  (M  TO))  (HP  (M  MARY))) 

(MP  (H  JOHM)  (HP  (N  TO)  (HP  (H  MARY)))) 

(HP  (H  JOHM)  (VP  (V  TO)  (HP  (H  MARY)))) 

(MP  (H  JOHM)  (PP  (P  TO)  (HP  (H  MARY)))))) 

CPP  (P  FROM)  (MP  (M  JOHM))  (VP  (V  TO)  (MP  (H  MARY)))) 

(PP  (P  FROM)  (MP  (H  JOHM))  (PP  (P  TO)  (MP  (H  MARY))))))))) 
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fracture:  (AMD  (DEFIMITIOM  THE  DET  SEMAMTICLESS) 

(DEFIMITIOM  CUP  H  OBJECTl) 

(OR  (AND  (DEFINITION  JOHN  N  PERSONl) 

(DEFINITION  FROM  N  (FROM  ?0)) 

(DEFINITION  MARY  N  PERS0N2) 

(DEFINITION  TO  P  (TO  ?0)) 

(DEFINITION  SLID  V  (GO  ?0  (PATH  ?1  ?2)))) 

(AND  (DEFINITION  JOHN  N  PERSONl) 

(DEFINITION  FROM  V  (FROM  ?0)) 

(DEFINITION  MARY  N  PERS0R2) 

(DEFINITION  TO  P  (TO  ?0)) 

(DEFINITION  SLID  V  (GO  ?0  (PATH  ?1  ?2)))) 

(AND  (DEFINITION  JOHN  N  PERSONl) 

(DEFINITION  FROM  P  (FROM  ?0)) 

(DEFINITION  MARY  N  PERS0H2) 

(DEFINITION  TO  P  (TO  ?0)) 

(DEFINITION  SLID  V  (GO  ?0  (PATH  ?1  ?2)))) 

(AND  (DEFINITION  JOHN  M  PERSONl) 

(DEFINITION  FROM  N  (FROM  ?0)) 

(DEFINITION  MARY  N  PERS0N2) 

(DEFINITION  TO  V  (TO  ?0)) 

(DEFINITION  SLID  V  (GO  ?0  (PATH  ?1  ?2)))) 

(AND  (DEFINITION  JOHN  N  PERSONl) 

(DEFINITION  FROM  V  (FROM  ?0)) 

(DEFINITION  MARY  N  PERSON2) 

(DEFINITION  TO  V  (TO  ?0)) 

(DEFINITION  SLID  V  (GO  ?0  (PATH  ?1  ?2)))) 

(AND  (DEFINITION  JOHN  N  PERSONl) 

(DEFINITION  FROM  P  (FROM  ?0)) 

(DEFINITION  MARY  N  PERS0N2) 

(DEFINITION  TO  V  (TO  ?0)) 

(DEFINITION  SLID  V  (GO  ?0  (PATH  ?1  ?2)))) 

(AND  (DEFINITION  JOHN  N  PERSONl) 

(DEFINITION  FROM  N  (FROM  ?0)) 

(DEFINITION  MARY  N  PERS0N2) 

(DEFIMITIOM  TO  N  (TO  ?0)) 

(DEFINITION  SLID  V  (GO  ?0  (PATH  ?1  ?2)))) 
(AND  (DEFINITION  JOHN  N  PERSONl) 

(DEFINITION  FROM  V  (FROM  ?0)) 

(DEFINITION  MARY  M  PERS0H2) 

(DEFINITION  TO  N  (TO  ?0)) 

(DEFINITION  SLID  V  (GO  ?0  (PATH  ?1  ?2)))) 

(AND  (DEFINITION  JOHN  N  PERSONl) 

(DEFINITION  FROM  P  (FROM  ?0)) 

(DEFINITION  MARY  N  PERS0N2) 

(DEFINITION  TO  N  (TO  ?0)) 

(DEFINITION  SLID  V  (GO  ?0  (PATH  ?1  ?2)))))) 
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Ics:  (OR  (ORIEIT  PERSOH  (TO  PERS0I2)) 

(ORIEIT  PERS0H2  (TO  PERS0I3)) 

(ORIEIT  PERSONS  (TO  PERSOMl))) 
sentence:  (JOHN  FACED  MARY) 

parse:  (S  (NP  (N  JOHN))  (VP  (V  FACED)  (NP  (N  MARY)))) 
fracture:  (AND  (DEFINITION  JOHN  N  PERSON 1) 

(DEFINITION  MARY  N  PERS0N2) 

(DEFINITION  FACED  V  (ORIENT  ?0  (TO  ?!)))) 
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FACED;  [V]  (ORIEIT  ?0  (TO  ?!)) 
SLID:  [V]  (GO  ?0  (PATH  ?1  ?2)) 
FROM;  ♦[?]  (FROM  ?0) 

TO:  *CM]  (TO  ?0) 

RAH:  CV]  (GO  ?0  ?1) 

THE:  [DET]  SEMAHTICLESS 

CUP;  [H]  OBJECTl 

BILL:  [H]  PERS0H3 

MARY:  [H]  PERS0H2 

JOHN:  [H]  PERSON 1 

ROLLED:  [V]  (GO  ?0  (PATH)) 


Appendix  B 

Kenunia  in  Operation 


This  appendix  contains  a  trace  of  Kenunia  processing  the  corpus  from  figure  4.8  using  the  prior  semantic 
knowledge  from  figure  4.10.  Given  this  information,  Kenunia  can  derive  tlie  syntactic  parameter  setting.s 
and  word-to-category  mappings  illustrated  in  figure  4.11.  This  trace  depicts  Kenunia  processing  the 
corpus,  utterance  by  utterance,  showing  the  interim  language  model  after  each  utterance,  as  well  as  the 
hypothesized  analysis  for  each  utterance.  When  no  analysis  is  possible,  the  propositions  to  be  retracted 
from  the  language  model  are  highlighted  as  culprits. 

John  roll  -erf. 

{Agent  ;  personj .  Theme  ;  persoiij ) 


Syntactic  Parameters: 


[l“  initial] 
[I^  final] 
[C”  final] 

Lexicon: 


cup: 

[X"] 

object,]} 

-cd: 

[V^] 

M) 

John: 

[D-] 

person,  {} 

slide: 

[X"] 

±{Theme  :  1} 

that: 

[X"] 

w 

0: 

n 

-L{} 

face: 

[X"] 

X{Patient  :  1,  Goal  :  0} 

from: 

[X"] 

X{SoURCE  :  0} 

Bill: 

[X"] 

person3{} 

the: 

[X"] 

Mary: 

[X"] 

person2{} 

to: 

[X"] 

±{Goal  ;  0} 

run: 

[X"] 

1{  Theme  :  1} 

roll: 

[1°] 

X{Theme  :  1} 

c- 


roll  -erf 
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Mary  roll  -  fd. 

{Agent  :  person.,.  Theme  :  person,} 


Syntactic  Parameters: 


[1°  initial] 
[l'  final] 
[C°  final] 

Lexicon: 


cup: 

~WT 

object]}} 

-frf: 

[V^] 

M) 

John: 

[D=] 

person] {} 

slide: 

[X"] 

±{Theme  :  1} 

that: 

[X”] 

-L{} 

0: 

[Cl 

M) 

face: 

[X"] 

1{Patient  :  I.Goal  :  0} 

from: 

[X"] 

1{ Source : 0} 

Bill: 

[X"] 

persona}} 

the: 

[X"] 

M) 

Mary: 

[D-] 

persouo } } 

to: 

[X”] 

1{Goal  :  0} 

run: 

[X"] 

1{Theme  :  1} 

roll: 

[f] 

1{Theme  ;  1} 

BiH  roll  -td. 


{Agent  :  pei‘son3.  Theme  ;  persona} 


Syntactic  Parameters: 


[1°  initial] 
[I^  final] 
final] 


Lexicon: 


cup: 

IjTT- 

objectjO 

-ed: 

[V^] 

John: 

m 

persouj  {} 

slide: 

[X"] 

±{Theme  :  1} 

that: 

[X”] 

-L{} 

0: 

[Cl 

M} 

face: 

[X"] 

±{Patient  ;  I.GoaL  :  0} 

from: 

[X"] 

IjSouRCE ; 0} 

BUI: 

[D'-'] 

persona  { } 

Ihe: 

[X"] 

-L{} 

Mary: 

(D-] 

persouo  { } 

to: 

[X"] 

±{Goal  •.  0} 

run: 

[X’>] 

1{TheME  :  1} 

roll: 

[lo] 

1{Theme  ;  1} 

c!. 


I 
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Thf  cup  roll  -td. 
{Theme  :  objectj} 


Syntactic  Parameters: 


[D“  initial] 
[I**  initial] 
[I^  final] 
[C“  final] 


Lexicon: 


cup: 

[N-’l 

objecti{} 

-ed: 

[VI 

John: 

[D=] 

personj]} 

slide: 

[X"] 

±{Theme  ;  1} 

that: 

[X"] 

M) 

0: 

[Cl 

M) 

face: 

[X"] 

±{  Patient  :  1,Goal  :  0} 

from: 

[X"] 

ijsouRCE ; 0} 

BtU: 

[D-] 

personal) 

the: 

[D^^] 

!{} 

Mary: 

[D^’l 

persono  { } 

to: 

[X"] 

1{GoaL  :  0} 

run: 

[X"] 

ijXHEME  ••  1} 

roll: 

[II 

1{ThEME  :  1} 

I  1 

The  cup 


Bill  run  -td  to  Mary. 

{Agent  ;  persou3.  Theme  :  persous.  Goal  :  persou-,} 
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Syntactic  Parameters: 


[P“  initial] 

[D“  initial] 

[I**  initial] 

[l'  final] 

[C“  final] 

[adjoin  V'"  right] 


Lexicon: 


cup: 

[N-1 

objectil) 

~ed: 

[V^'] 

M) 

■John: 

[D=] 

person,  {} 

slide: 

[X"] 

1{TheME  :  1} 

that: 

[X’‘] 

!{} 

0; 

[Cl 

!{} 

face: 

[X"] 

1{ Patient  :  1,Goal  :  0} 

from: 

(X"] 

ijSouRCE : 0} 

Bill: 

[D^] 

personal) 

the: 

[D°] 

!{} 

Mary: 

[D=] 

person^  { } 

to: 

[Pi 

1{Goal  ;  0} 

run: 

[II 

±{  Theme  ;  1} 

roll: 

[II 

±{Theme  ;  1} 

240 
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Bill  run  -td  from  John 

{Agent  :  persou3.  Theme  ;  persoiiy.  Source  :  persoui } 


Syntactic  Parameters: 


initial] 

[D**  initial] 

[I*^  initial] 

[1^  final] 

[C°  final] 

[adjoin  V"  right] 

Lexicon: 


cup: 

[N-] 

object,]} 

•ed: 

[V=] 

-L{} 

John: 

[D^] 

person,!} 

slide: 

[X”] 

1{Theme  ;  1} 

that: 

[X"] 

-L{} 

0: 

[Cl 

!{} 

face: 

[X"] 

±{  Patient  :  1,Goal  :  0} 

from: 

[Pi 

ijSoURCE : 0} 

Bill: 

[D-] 

persona!} 

ike: 

[Dl 

!{} 

Mary: 

[Dl 

person^ { } 

to: 

[Pi 

X!Goal  ;  0} 

run: 

[II 

1{Theme  ;  1} 

roll: 

ri1 

X{Theme  :  1} 

Bill  run  -td  lo  tht  cup. 

{Agent  ;  persou3.  Theme  .  persou3.  CJoal  :  object i } 
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Syntactic  Parameters: 


[P'^  initial] 

[D°  initial] 

[1°  initial] 

[I^  final] 

[C“  final] 

[adjoin  V"  right] 

Lexicon: 


cup\ 

[N-] 

object^} 

-erf: 

[V-] 

-L{} 

John: 

[D-] 

personi  {} 

slide: 

[X"] 

1{Theme  :  1} 

that: 

[x-] 

!{} 

0: 

[Cl 

!{} 

face: 

[X"] 

1{PatiENT  :  I.Goal  :  0} 

from: 

[P°] 

ijsoURCE : 0} 

Bill: 

[D2] 

persona  {} 

the: 

[D°] 

!{} 

Mary: 

[D-] 

person^  { } 

to: 

[P°] 

±{Goal  :  0} 

run: 

[I°] 

1{Theme  :  1} 

roll: 

[f] 

1{Theme  ;  1} 
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Th(  cup  slidf  -fd  from  John  to  Mary. 

{Theme  ;  object,. Source  :  person,.  Goal  :  person-,} 


Syntactic  Parameters: 

[P°  initial] 

[D°  initial] 

[1°  initial] 

[I*  final] 

[C“  final] 

[adjoin  V"  right] 

Lexicon: 


cup: 

[N-1 

object,  {} 

-frf: 

[V-'] 

-L{} 

John: 

[D^] 

person,  { } 

slide: 

[f] 

±{Theme  :  1) 

that: 

[X'‘] 

-L{} 

0: 

[C''] 

!{} 

face: 

[X"] 

±{Patient  :  I.Goal  :  0} 

from: 

[P°] 

ijsoURCE : 0} 

Bill: 

[D=] 

person3{} 

the: 

[D°] 

Mary: 

[D-] 

person^  { } 

to: 

[P«] 

X{Goal  ;  0} 

run: 

[f] 

X{Theme  :  1} 

roll: 

[I“] 

±{Theme :  1} 

c- 


from  John 


Jahti  fact  -td  Mary. 

{Agent  :  person^  Patient  :  person, .  (iOAL  ;  person-,} 


243 


Culprits: 

category} -f(/)  =  V' 
bar-level} -frf)  =  2 

Syntactic  Parameters: 

[P^  initial] 

[D°  initial] 

[I*^  initial] 

[I^  final] 

[C°  final] 

[adjoin  V"  right] 

Lexicon: 


cup. 

[N-’] 

object, {} 

-cd: 

[X"] 

!{} 

John: 

[D-] 

person,  {} 

slide: 

[f] 

1{Theme  :  1} 

that: 

[X"] 

-L{} 

0; 

[C^] 

M) 

face: 

[X"] 

±{  Patient  :  1,Goal  ;  0} 

from: 

[P°] 

±{SoURCE : 0} 

Bill: 

[D-] 

persongO 

the: 

[Dl 

Mary: 

[D-] 

persong  {  } 

to: 

[Pi 

±{Goal  :  0} 

rwn: 

[II 

1{Theme  :  1} 

roll: 

PI 

±{Theme ;  1} 
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John  fact  -td  Mary. 

{Agent  :  person,.  Patient  :  person,. CioAL  :  person-,} 


Syntactic  Parameters: 


[V"'  final] 
final] 

[P**  initial] 

[o'*  initial] 
initial] 

[1‘  final] 

[C"  final] 

[adjoin  V*  right] 
[adjoin  left] 


Lexicon: 


cup: 

[N-] 

object , { } 

-ed: 

[1“] 

-L{} 

John: 

[D-] 

person,}} 

slide: 

[f] 

1{Theme  :  1} 

that 

[X"] 

«: 

[Cl 

M) 

face: 

[V°] 

±{ Patient  ;  I.Goal  :  0} 

from: 

[P^] 

1{S0URCE : 0} 

Bill: 

[D-] 

person3  { } 

the: 

[D“] 

!{} 

Mary: 

[D-] 

persono  { } 

to: 

[P“] 

±{Goal  :  0} 

run: 

[f] 

±{ThEME  :  1} 

roll: 

[1°] 

-L{Theme  :  1} 

Mary  1 


ID 


2A 


John  roil  -td. 

(Agent  :  persoii] .  Theme  ;  persou; } 


Culpiitsi 

category(  roU)  —  I 

Syntactic  Parameters: 

[V’^  final] 

[V'  final] 

[P”  initial] 

[o'*  initial] 

[1°  initial] 

[J*  final] 

[C°  final] 

[adjoin  V"  right] 
[adjoin  1°  left] 

Lexicon: 


cup: 

[N-’] 

objectiO 

-ed: 

[f] 

JL{} 

John: 

[D-] 

person; {} 

slide: 

[f] 

±{ThEME  :  1} 

that: 

[X'‘] 

M) 

0: 

[Cl 

M) 

face: 

[V°] 

1{Patient  :  1,  Goal  ;  0} 

from: 

[P°] 

ijsoURCE : 0} 

Bill: 

[D^] 

personal) 

the: 

[D°] 

Mary: 

[D^] 

person^ { } 

io: 

[Pi 

±{Goal  :  0} 

run: 

[11 

±{Theme ;  1} 

roll: 

[Xl 

1{Theme  ;  1} 

.\rrV.M)l\  li  ht.M  .MA  IS  Ort.H  M'IOS 


John  roll  -iJ. 

{ AciKN  I  .  |><^rs<>ll| ,  Themk  :  person, } 


Syntactic  Parameters: 


filial] 

[V'  filial] 

[P"  initial] 

[D"  initial] 

[l"  initial] 

[l‘  final] 

[( final] 

[adjoin  \''  right] 
[adjoin  l"  left] 

Lexicon: 


cup: 

[N-] 

object,]} 

-ed: 

in 

John: 

[D-’] 

person, { } 

sltde: 

[II 

±{ThEME  :  1} 

that: 

[X"] 

!{} 

0: 

[C‘'] 

-L{} 

face. 

[V] 

1{ Patient  :  1,  Goal  :  0} 

from: 

[Pi 

ijSoURCE : 0} 

Bill: 

[[)'-’] 

personal 

the: 

[Dl 

^{} 

Mary: 

[O'-’] 

person ,  { } 

to: 

[Pi 

±{CfOAL  :  0} 

mn: 

[II 

±{ThEME  :  1 } 

roll: 

[Vi 

±{ThEME  :  1} 

Mary  roll  -td. 

{Agent  :  persouo.  Theme  :  person,} 
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Syntactic  Parameters: 

[V'“  final] 

[V*  final] 

[P“  initial] 

[D“  initial] 

[I*^  initial] 

[I*  final] 

[C^’  final] 

[adjoin  V'"  right] 
[adjoin  I*'  left] 

Lexicon: 


cup: 

[N-’j 

object;  {] 

-ed: 

[II 

!{} 

John: 

[D-] 

person; {] 

slide: 

[f] 

±{Theme  :  1] 

that: 

[X"] 

0: 

[C“] 

!{} 

face: 

[V«] 

±{Patient  :  1,Goal  :  0} 

from: 

[Pi 

l{SouRCE : 0] 

Bill: 

m 

person3  { ] 

the: 

[D1 

Mary: 

Pi 

person^  {  ] 

to: 

[Pi 

1{Goal  ;  0] 

run: 

[I1 

1{Theme  ;  1] 

roll: 

[Vi 

±{Theme :  1} 

c- 


roU  -ed  1  Vo 


1 
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Bill  roll  -id. 

{Agent  :  persona.  Theme  :  persoua} 


Syntactic  Parameters: 


[V°  final] 

[V^  final] 

[P“  initial] 

[D**  initial] 

[l“  initial] 

[l'  final] 

[C°  final]  ^ 
[adjoin  V'"  right] 
[adjoin  1°  left] 

Lexicon: 


cup: 

[N-] 

object, {} 

~ed: 

[II 

-L{} 

John: 

[D^] 

person,  {} 

slide: 

[f] 

1{Theme  :  1} 

that: 

[X"] 

M) 

0: 

[C°] 

M) 

face: 

[V°] 

1{  Patient  :  1,  Goal  :  0} 

from: 

[P«] 

IjSOURCE : 0} 

Bill: 

[D-] 

persona]} 

the: 

[D«] 

-L{} 

Mary: 

[D-] 

persona { } 

to: 

[P°] 

XjGOAL  :  0} 

run: 

[II 

1{Theme  ;  1} 

roll: 

[V«] 

±{Theme  ;  1} 

TIk  cup  roll  -id. 
{Theme  :  object] } 
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Syntactic  Parameters: 


[V“  final] 

[V‘  final] 

[P“  initial] 

[D*^  initial] 

[I**  initial] 

[I*  final] 

[C°  final]  ^ 
[adjoin  V"  right] 
[adjoin  1°  left] 

Lexicon: 


cup: 

[N-'] 

object] {} 

-ed: 

[!“] 

-L{} 

John. 

[D^] 

person]  { } 

slide: 

[f] 

1{TheME  :  1} 

that: 

[X"] 

-L{} 

0: 

IC1 

!{} 

face: 

[V«] 

±{Patient  :  I.Goal  :  0} 

from: 

[P‘'] 

i [Source ; 0} 

Bill: 

[D^j 

personal) 

the: 

[Dl 

-LO 

Mary: 

[D-] 

persono  { } 

to: 

[P«] 

d.{GoAL  :  0} 

run: 

[11 

±{Theme  :  1} 

roll: 

[Vi 

J.{Theme  :  1} 

f 
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Bill  run  -id  to  Mary. 

{Agent  .  persoua.  Theme  :  persoua.  CiOAL  :  person,} 


Culprits: 

category}  run)  =  I 
Syntactic  Parameters: 

final] 

[V’*  final] 

[P“  initial] 

[D°  initial] 

[1°  initial] 

[l'  final] 

[C°  final] 

[adjoin  V‘  right] 
[adjoin  1°  left] 

Lexicon: 


cup\ 

[N-‘] 

object  1 { } 

-ed: 

[II 

-L{} 

John: 

[D-] 

persoDj  {} 

slide: 

[I“] 

1{THEME  ;  1} 

that: 

[X"] 

M) 

0: 

[C«] 

M) 

face: 

[Vi 

±{ Patient  ;  I.Goal  :  0} 

from: 

[Pi 

±{SOURCE : 0} 

Bill: 

[Dl 

person3  { } 

the: 

[Dl 

-L{} 

Mary: 

[Dl 

person2  { } 

to: 

[Pi 

±{Goal  :  0} 

run: 

[Xl 

±{ Theme  ;  1} 

roll: 

[Vi 

±{Theme  :  1} 

Bill  run  -id  io  Mary. 

{Agent  :  persoua.  Theme  :  persong.GoAL  :  persoug} 
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Syntactic  Parameters: 

[V^’  final] 

[\'‘  final] 
initial] 

[D*^  initial] 

[1°  initial] 

[l‘  final] 

[C"  final] 

[adjoin  V‘  right] 
[adjoin  I*'  left] 

Lexicon: 


cup: 

[N-1 

objectj{} 

-erf: 

[!«] 

!{} 

John: 

[D^] 

person]  { } 

slide. 

[f] 

1{Theme  ;  1} 

that: 

[X"] 

-L{} 

0: 

[C^l 

-L{) 

face: 

[Vi 

1{Patient  :  I.Goal  :  0} 

from: 

[Pi 

±{S0URCE : 0} 

Bill: 

[Dl 

personal) 

the: 

[Dl 

-L{} 

Mary: 

[Dl 

person^  { } 

to: 

[Pi 

1{Goal  :  0} 

run: 

[Vi 

1{Theme  :  1} 

roll: 

[Vi 

1{Theme  :  1} 

1  to  Mary 


252 


APPESDIX  B  KEMMA  IX  OPEBATIOX 


Bill  run  -td  from  John. 

{Agent  :  persou3.  Theme  :  persou;).  Soerce  :  pei'souj } 


Syntactic  Parameters: 


[V“  final] 

[V*  final] 

[P^  initial] 

[D“  initial] 

[1°  initial] 

[I^  final] 

[C"  final] 

[adjoin  V'  right] 
[adjoin  1°  left] 

Lexicon: 


cup. 

[N-‘] 

objectiO 

~ed: 

[f] 

M) 

John: 

[D^] 

person, {} 

slide: 

[I^^] 

1{Theme  :  1} 

that: 

[X"] 

-L{} 

0: 

[Cl 

face: 

[V«] 

1{ Patient  :  1,Goal  ;  0} 

from: 

[P°] 

ijSoURCE : 0} 

Bill: 

[D-] 

personal} 

the: 

[D”] 

-L{} 

Mary: 

[D-] 

person^  { } 

to: 

[P°] 

1{Goal  ;  0} 

run: 

[Vi 

±{Theme  ;  1} 

roll: 

[Vi 

1{Theme  :  1} 

c- 


I 

c* 


(  from  John 


BtH  run  -td  to  iht  cup. 

{Agent  :  persou3.  Theme  :  persou3.  Goal  :  objectj } 
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Syntactic  Parameters: 

[V“  final] 

[V‘  final] 

[P°  initial] 

[D°  initial] 

[1°  initial] 

[I^  final] 

[C“  final] 

[adjoin  V'"  right] 
[adjoin  1°  left] 

Lexicon: 


cup: 

[N-] 

objectil} 

-erf: 

[II 

!{} 

John: 

[D-] 

personi  {} 

slide: 

[I^] 

1{Theme  :  1} 

that: 

[X"] 

M) 

0; 

[Cl 

!{} 

face: 

[V°] 

1{PaT1ENT  :  1,GoaL  :  0} 

from: 

[Pi 

±{SouRCE  :  0} 

Bill: 

[Dl 

person3  { } 

the: 

[Dl 

Mary: 

[D-] 

personal} 

to: 

[Pi 

±{Goal  ;  0} 

run: 

[Vi 

±{Theme :  1} 

roll: 

[Vi 

±{Theme  :  1} 

th(  cup 
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Tht  cup  slide  -(d  from  John  io  Mary. 

{Theme  ;  objectj ,  Source  :  person, .  Goal  ;  person-, } 


Culprits: 

category  (slide)  =  1 

Syntactic  Parameters: 

[V°  final] 

[V*  final] 

[P®  initial] 

[D°  initial] 

[1°  initial] 

[I*  final] 

[C°  final] 

[adjoin  V"  right] 
[adjoin  left] 

Lexicon: 


cup: 

[N-] 

object,  {} 

-ed: 

[I°] 

John: 

[D-] 

person, {} 

slide: 

[X"] 

±{THEME  :  1} 

that: 

[X"] 

!{} 

0: 

[C°] 

-L{} 

face: 

[V«] 

±{Patient  :  1,Goal  :  0} 

from: 

[P«] 

±{SoURCE : 0} 

Bill: 

[D^] 

persongj} 

the: 

[0“] 

Mary: 

[D^] 

personT { } 

to: 

[P“] 

±{Goal  :  0} 

run: 

[V“] 

±{Theme  ;  1} 

roll: 

[V"] 

±{  Theme  :  1} 

The  cup  slide  -ed  from  John  to  Mary. 

{Theme  :  objecti,  Source  :  person,.  Goal  :  person^} 
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Syntactic  Parameters; 

[V‘'  final] 

[V^  final] 

[P“  initial] 

[o'*  initial] 

[1°  initial] 

[I*  final] 

[C"  final] 

[adjoin  V'  right] 
[adjoin  left] 

Lexicon: 


cup: 

[N-] 

object, {} 

•ed: 

M) 

John: 

[D-] 

person,  {} 

slide: 

[V'’] 

1{Theme  :  1} 

that: 

[X"] 

!{} 

0: 

[Cl 

!{} 

face: 

[V°] 

±{Patient  :  TGoal  ;  0} 

from: 

in 

±{SoURCE : 0} 

Bill: 

[D-] 

person3  { } 

the: 

[D°] 

M) 

Mary: 

[D-] 

person^  {  } 

to: 

[P«] 

±{GoaL  :  0} 

run: 

[V«] 

±{Theme  :  1} 

roll: 

[V«] 

±{Theme  :  1} 

t  from  John 
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2r36 


■John  fact  -fd  Mary. 

{Agent  :  persou,.  Patient  ;  persoUj.CJoAL  :  person,} 


Syntactic  Parameters; 


[V’**  final] 

[V'’  final] 
initial] 

[D°  initial] 

[I*^  initial] 

[I^  final] 

[C'“  final] 

[adjoin  V"  right] 
[adjoin  1°  left] 

Lexicon: 


cup'. 

[N-l 

object  1 { } 

•ed: 

[II 

M} 

John: 

[D^] 

personi  {} 

slide: 

[Vi 

1{Theme :  1} 

that: 

[X-] 

!{} 

[Cl 

-L{} 

face: 

[Vi 

1{Patient  :  I.Goal  :  0} 

from: 

[Pi 

ijSoURCE : 0} 

Bill: 

[Dl 

person3  { } 

the: 

[Dl 

Mary: 

[Dl 

person^ { } 

to: 

[Pi 

±{Goal  :  0} 

run: 

[Vi 

±{Theme  :  1} 

roll: 

[Vi 

±{Theme  :  1} 
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John  roll  -td. 

{Agent  .  persou, . 'I'heme  .  person,] 


Syntactic  Parameters: 


[V'^  final] 

[V‘  final] 

[P"  initial] 
initial] 

[I**  initial] 

[1‘  final] 

[C*^  final] 

[adjoin  V''  right] 
[adjoin  left] 

Lexicon: 


cup: 

[N-’] 

object, {} 

-erf: 

[f] 

M) 

John: 

[D-] 

person,  {} 

slide: 

[V"] 

1{Theme  :  1} 

thal: 

[X"] 

!{} 

0: 

[Cl 

!{} 

face: 

[V°] 

J.{Patient  :  I.Goal  :  0} 

from: 

[Pi 

ijSoURCE : 0} 

Bill: 

[Dl 

persons]} 

the: 

[Dl 

M) 

Mary: 

[Dl 

person^ { } 

io: 

[Pi 

±{Goal  ;  0} 

run: 

[Vi 

1{Theme  ;  1} 

roll: 

[Vi 

±{Theme  :  1} 

I  I 

roU  -td  i 


1 
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Mary  roll  -td. 

{Agent  ;  person^.  Theme  :  persou-,} 


Syntactic  Parameters: 


final] 

[V^  final] 
initial] 

[D°  initial] 
initial] 

[I*  final] 

[C"  final] 

[adjoin  V'  right] 
[adjoin  1°  left] 

Lexicon: 


cup: 

[N-'] 

object j{} 

-erf: 

[II 

!{} 

John: 

[D-] 

personi  {} 

slide: 

[Vi 

1{Theme  :  1} 

thai: 

[X"] 

!{} 

0; 

[Cl 

W 

face: 

[Vi 

1{ Patient  ;  I.Goal  ;  0} 

from: 

[Pi 

xjsouRCE ; 0} 

Bill: 

[Dl 

persona]} 

the: 

[Dl 

M) 

Mary: 

[Dl 

persona  { } 

to: 

[Pi 

X{Goal  :  0} 

run: 

[Vi 

±{ Theme  ;  1} 

roll: 

[Vi 

±{Theme  :  1} 
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Bill  roll  -fd. 

{Agent  :  persoiig.  Theme  ;  persous) 


Syntactic  Parameters: 

[V’*^  final] 

[V^  final] 

[P”  initial] 

[D**  initial] 

[l"  initial] 

[I*  final] 

[C*'  final] 

[adjoin  V'  right] 
[adjoin  left] 

Lexicon: 


cup: 

[N-] 

objectjO 

-ed: 

[I«] 

!{} 

John: 

[D^’l 

pel  ,oni{} 

slide. 

[Vi 

1{Theme  :  1} 

that: 

[X"] 

!{} 

0: 

[Cl 

!{} 

face. 

[Vi 

±{Patient  ;  I.Goal  :  0} 

from: 

[PI 

ijsoURCE : 0} 

Bill: 

[Dl 

personal) 

the. 

[D1 

-L{} 

Mary: 

[Dl 

persoiij  { } 

to: 

[Pi 

±{Goal  ;  0} 

ru  n: 

[Vi 

±{Theme  :  1} 

roll: 

[Vi 

±{Theme  :  1} 
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Till  ( up  full  -td. 
{Theme  .  object, } 


Syntactic  Parameters: 


[V“  final] 

[V'  final] 

[P'’  initial] 
initial] 

[l“  initial] 

[I^  final] 

[C'"  final] 

[adjoin  \'"  right] 
[adjoin  left] 

Lexicon: 


cup: 

[N-'] 

object,!} 

-frf: 

[!"] 

x{} 

John: 

[D^] 

person, {} 

slide: 

[Vi 

1{Theme  :  1} 

that: 

[X"] 

!{} 

0: 

[Cl 

-L{} 

face: 

[Vi 

1{Patient  :  1,  Goal  :  0} 

from: 

[Pi 

1{S0URCE : 0} 

Bill: 

[Dl 

persona {} 

the: 

[Dl 

X{} 

Mary: 

[Dl 

person^  { } 

to: 

[Pi 

±{Goal  :  0} 

run: 

[Vi 

1{Theme  :  1} 

roll: 

[Vi 

±{Theme  :  1} 

Bill  run  -td  to  Mary. 

{Agent  :  persou3, Theme  :  persou3.  (ioAi.  :  person,) 
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Syntactic  Parameters: 

[V**  final] 

[\''  final] 

[P“  initial] 

[0“  initial] 

[l"  initial] 

[I^  final] 

[C'“  final] 

[adjoin  V~  right] 
[adjoin  1°  left] 

Lexicon: 


cup: 

[N-’] 

object,  {} 

-(d: 

[I«] 

^{} 

John-. 

[D^'j 

person, {} 

slide: 

[V°] 

1{Theme  :  1} 

that: 

[X"] 

i-{} 

0: 

r] 

1{] 

face: 

[v°] 

1{Patient  ;  1,  Goal  :  0} 

from: 

in 

ijSoirRCE  :  0} 

Bill: 

[D-] 

person3  { } 

the: 

[D°] 

!{} 

Mary: 

[D-] 

person2  { } 

to: 

[P"] 

±{Goal  :  0} 

run: 

[Vi 

±{Theme  .  1} 

roll: 

[VI 

1{Theme  ;  1} 

1  to  Mary 
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Bill  run  -fd  from  John. 

{Agent  :  pei'soii3.  Theme  :  persoii3.  Sol  rge  :  persou, } 


Syntactic  Parameters: 

[V’*'  final] 

[V'  final] 

[P“  initial] 

[D*^  initial] 

[l“  initial] 

[I*  final] 

[C"  final] 

[adjoin  V"  right] 
[adjoin  1°  left] 


Lexicon: 

cup: 

[N-] 

object i{} 

^  -ed: 

[I“] 

i-{} 

John: 

[D^] 

person;  {} 

slide 

[V^'j 

±{Theme  :  1} 

that: 

[X"] 

!{} 

(9: 

[C] 

!{} 

face: 

[V“] 

1{Patient  :  TGoal  ;  0} 

from: 

[P°] 

l{SoURCE ; 0} 

Bill: 

[D-] 

persona  { } 

Ihe: 

[D“] 

-L{} 

Mary: 

[D^] 

person^  { } 

to: 

[P“] 

1{Goal  ;  0} 

run: 

[V"] 

X{Theme  ;  1} 

roll: 

[V°] 

IITheme  :  1} 

c- 


1  from  John 


Bill  run  -td  to  tht  cup. 

{Agent  :  persoug.  Theme  :  person^,  Goal  :  object, } 
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Syntactic  Paranieters: 


[V“  final] 

[V^  final] 

[P”  initial] 
initial] 

[l'^  initial] 

[I^  final] 

[C“  final] 

[adjoin  V"  right] 
[adjoin  I*’  left] 

Lexicon: 


cup: 

[N'-’] 

object ,  { } 

~ed: 

[1°] 

■John: 

[D-] 

person,  {} 

slide: 

[Vi 

±{ThEME  :  1} 

that: 

[X"] 

±{} 

0: 

[Cl 

!{} 

face: 

[Vi 

±{Patient  :  TGoal  ;  0} 

from: 

[Pi 

±{SouRCE ; 0} 

Bill: 

[Dl 

personal) 

the: 

[Dl 

-LO 

Mary: 

[Dl 

person^  { } 

to: 

[Pi 

±{Goal  :  0} 

run: 

[V'l 

±{Theme  :  1} 

roll: 

[Vi 

1{Theme  :  1} 
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The  cup  slide  -ed  from  John  to  Mary. 

{Theme  ;  object^  Soi  rce  :  person,.  (iOAL  :  person, } 


Syntactic  Parameters: 

[V"  filial] 

[V'*  final] 

[P"  initial] 

[D°  initial] 

[l“  initial] 

[I*  final] 

[C“  final] 

[adjoin  V’"  right] 
[adjoin  1°  left] 

Lexicon: 


cup: 

[N-’] 

object ,  { } 

•ed: 

[f] 

M) 

John: 

[D-] 

person,  {} 

slide: 

[V«] 

1{Theme  ;  1} 

that: 

[X"] 

!{} 

0: 

[Cl 

-L{} 

face: 

[Vi 

1{ Patient  :  I.Goal  ;  0} 

from: 

[Pi 

1{S0URCE : 0} 

Bill: 

[Dl 

person3{} 

the: 

[Dl 

-L{} 

Mary: 

[Dl 

person^ { } 

to: 

[Pi 

±{Goal  ;  0} 

run: 

[Vi 

1{ Theme  ;  1} 

roll: 

[Vi 

1{Theme :  1} 

/  from  John 


John  fact  -(d  Mary. 

{Agent  ;  person^.  Patient  :  persou, .  C^oal  :  person^,} 
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Syntactic  Parameters: 


[\’"  final] 

[V’‘  final] 

[P“  initial] 
initial] 

[l“  initial] 

[1^  final] 

[(.''*  final] 

[adjoin  V'  right] 
[adjoin  left] 

Lexicon: 


cup: 

[N-‘] 

object 

-ed: 

[I“] 

^{} 

John: 

[D^] 

person;  {} 

slide: 

[Vi 

±{Theme  :  1} 

that: 

[X'*] 

-L{} 

0; 

[Cl 

-L{} 

face: 

[Vi 

1{ Patient  :  I.Goal  ;  0} 

from: 

[Pi 

1{ Source : 0} 

Bill: 

[Dl 

personal) 

the: 

[Dl 

-L{} 

Mary: 

[Dl 

person^  { } 

io: 

[Pi 

±{Goal  :  0} 

run: 

[Vi 

±{Theme  :  1} 

roll: 

[Vi 

X{Theme  :  1} 
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Appendix  C 


Abigail  in  Operation 


This  appendix  enumerates  the  perceptual  primitives  recovered  by  Abigail  after  processing  the  first 
172  frames  of  the  movie  discussed  iu  section  6.1.  Figure  8.13  contains  an  event  graph  depicting  tlie 
temporal  structure  of  these  primitives. 

[0,0] (PLACE  [JOHH-part]  PLACE-0) 

[0 , 0] ( SUPPORTED  [JOHH-part] ) 

[0,1]  (PUCE  [(EYE  JOHH)]  PLACE-1) 

[0,65]  (PLACE  [BALL-part]  PUCE-13) 

[0,65](C0HTACTS  [TABLE  BOX-part]  [BALL-part]) 

[0,65] (SUPPORTS  [TABLE  BOX-part]  [BALL-part]) 

[0,65] (PLACE  [(LIIE-SEGMEIT3  BALL)]  PLACE-11) 

[0 , 7 1] ( SUPPORTED  [BALL-part] ) 

[0,71] (SUPPORTED  [(LIHE-SEGMEIT3  BALL)]) 

[0,71] (SUPPORTS  [BALL-part]  [(LIHE-SEGMEHT3  BALL)]) 

[0,171] (SUPPORTED  [TABLE  BOX-part]) 

[0,171] (SUPPORTED  [(BOTTOM  BOX)]) 

[0,171] (SUPPORTS  [TABLE  BOX-part]  [(BOTTOM  BOX)]) 

[1.64](M0VIIG  [JOHH-part]) 

[2 . 2]  (ROTATIHG-COUHTER-CLOCKHISE  [JOHH-part] ) 

[2.2]  (ROTATIHG  [JOHH-part] ) 

[2 . 15]  (MOVIHG-ROOT  [JOHH-part] ) 

[2.15] (TRAHSUTIHG  [(EYE  JOHH)]  PLACE-2) 

[2.15]  (MOVIHG-ROOT  [(EYE  JOHH)]) 

[2.15] (M0VIHG  [(EYE  JOHH)]) 


[2,60](TRAHSUTIHG  [JOHH-part]  PLACE-9) 
[16 , 16] (SUPPORTED  [JOHH-part] ) 
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[16.17]  (PLACE  [(EYE  JOHI)]  PLACE-3) 

[18 . 18]  (ROTATIIG-COUITER-CLOCKWISE  [JOHM-part] ) 

[18.18] (R0TATIIG  [JOHI-part] ) 

[18 . 32]  (MOVIIG-ROOT  [JOHi-paort] ) 

[18.32]  (TRAISLATIIG  [(EYE  JOHI)]  PLACE-4) 

[18.32]  (MOVIIG-ROOT  [(EYE  JOHI)]) 

[18.32]  (NOVIIG  [(EYE  JOHI)]) 

[33 . 33]  (SUPPORTED  [JOHI-part] ) 

[33.34]  (PLACE  [(EYE  JOHI)]  PLACE-5) 

[36 . 35]  (ROTATIIG-COUITER-CLOCKWISE  [JOHI-part] ) 

[36 . 36]  ( ROTATIIG  [JOHI-part] ) 


[36.48]  (MOVIIG-ROOT  [JOHI-part] ) 

[35.48]  (TRAISLATIMG  [(EYE  JOHN)]  PLACE-6) 

[36.48]  (MOVIMG-ROOT  [(EYE  JOHI)]) 

[35.48]  (MOVIMG  [(EYE  JOB*)]) 

[49 . 49]  (SUPPORTED  [JOHI-part] ) 

[49.50]  (PLACE  [(EYE  JOHI''T  PLACE-7) 

[51 . 51]  (ROTATIIG-COUMTER-CLOCKWISE  [JOHI-part] ) 

[51.51]  (ROTATIIG  [JOHI-part] ) 

[61 . 58]  (MOVIIG-ROOT  [JOHI-part] ) 

[51.58]  (TRAISLATIIG  [(EYE  JOHI)]  PLACE-8) 

[51,68] (MOVIIG-ROOT  [(EYE  JOHI)]) 

[51.58]  (MOVIIG  [(EYE  JOHI)]) 

[59 . 64]  ( SUPPORTED  [JOHI-part] ) 

[59.70]  (PLACE  [(EYE  JOHI)]  PLACE-16) 

[64.64]  (TRAISLATIIG  [JOHI-part]  PLACE-10) 

[65.66]  (PUCE  [JOHI-part]  PLACE-12) 

[66.71]  (TRAISLATIIG  [BALL-part]  PLACE-19) 

[66.71]  (MOVIIG-ROOT  [BALL-part]) 

[66.71]  (MOVIIG  [BALL-part]) 

[66.71]  (SUPPORTED  [JOHI-part]) 

[66.71]  (SUPPORTS  [JOHI-part]  [BALL-part]) 

[66. 71]  (TRAISLATIIG  [(LIIE-SEGMEHT3  BALL)]  PLACE-17) 

[66.71]  (MOVIIG-ROOT  [(LIIE-SEGMEIT3  BALL)]) 

[66.71]  (MOVIIG  [(LIIE-SEGMEIT3  BALL)]) 

[66.71]  (SUPPORTS  [(LIIE-SEGMEIT3  BALL)]  [BALL-part]) 

[66.71]  (SUPPORTED  [BALL-part  JOHI-part]) 

[66.71]  (SUPPORTS  [BALL-part  JOHI-part]  [(LIIE-SEGMEIT3  BALL)]) 

[67.67]  (TRAISLATIIG  [JOHI-part]  PUCE-15) 

[67.67]  (TRAISLATIIG  [BALL-part  JOHI-part]  PLACE- 14) 
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[71.71]  (FLIPPIIG  [BALL-part] ) 

[71.71]  (ROTATIMG-COUITER-CLOCKWISE  [BALL-part] ) 

[71.71]  (ROTATIIG  [BALL-part]) 

[71.71]  (FXIPPIMG  [JOHM-part]) 

[71.71]  (ROTATIHG-COUBTER-CLOCKWISE  [JOHM-part] ) 

[71.71]  (ROTATIMG-CLOCKWISE  [JOHM-part] ) 

[71.71]  (ROTATIMG  [JOHM-part] ) 

[71 . 71]  (MOVIMG-ROOT  [JOHM-part] ) 

[71.71]  (SUPPORTS  [BALL-part]  [JOHM-part]) 

[71.71]  (TRAMSLATIMG  [(EYE  JOHM)]  PLACE-18) 

[71.71]  (ROTATIMG-COUMTER-CLOCKWISE  [(EYE  JOHM)]) 

[71.71]  (ROTATIMG  [(EYE  JOHM)]) 

[71.71]  (MOVIMG-ROOT  [(EYE  JOHM)]) 

[71.71]  (MOVIMG  [(EYE  JOHM)]) 

[71.71]  (ROTATIMG-CLOCKWISE  [(LIME-SEGMEMT3  BALL)]) 

[71.71]  (ROTATIMG  [(LIME-SEGMEMT3  BALL)]) 

[71.71]  (FLIPPIMG  [BALL-part  JOHM-part]) 

[71.71]  (ROTATIMG-COUMTER-CLOCKWISE  [BALL-part  JOHM-part]) 

[71.71]  (ROTATIMG-CLOCKWISE  [BALL-part  JOHM-part]) 

[71.71]  (ROTATIMG  [BALL-part  JOHM-part]) 

[71.71]  (MOVIMG-ROOT  [BALL-part  JOHM-part]) 

[71.71]  (SUPPORTS  [(LIME-SEGMEMT3  BALL)]  [BALL-part  JOHM-part]) 

[72.72]  (PLACE  [BALL-part]  PLACE-22) 

[72,72] (PLACE  [(EYE  JOHM)]  PLACE-21) 

[72,72] (PLACE  [(LIME-SEGMEBT3  BALL)]  PLACE-20) 

[73.80]  (TRAMSLATIMG  [BALL-part]  PLACE-25) 

[73 . 80]  (MOVIMG-ROOT  [BALL-part] ) 

[73 . 80]  (MOVIMG  [BALL-part] ) 

[73 . 80]  (MOVIMG-ROOT  [JOHM-part] ) 

[73.80]  (TRAMSLATIMG  [(EYE  JOHM)]  PLACE-24) 

[73.80]  (MOVIMG-ROOT  [(EYE  JOHM)]) 

[73.80]  (MOVIMG  [(EYE  JOHM)]) 

[73.80]  (TRAMSLATIMG  [(LIME-SEGMEMT3  BALL)]  PLACE-23) 

[73.80]  (MOVIMG-ROOT  [(LIME-SEGMEMT3  BALL)]) 

[73.80]  (MOVIMG  [(LIME-SEGMEMT3  BALL)]) 

[73.80]  (MOVIMG-ROOT  [BALL-part  JOHM-part]) 

[73,80] (MOVIMG-ROOT  [BALL  JOHM-part]) 

[81,82] (PLACE  [BALL-part]  PLACE-28) 

[81,82] (PLACE  [(EYE  JOHM)]  PLACE-27) 

[81,82] (PLACE  [(LIME-SEGMEMT3  BALL)]  PLACE-26) 


[83 . 83]  (ROTATIIG-CLOCKWISE  [ JOHM-paxt] ) 

[83.83]  (ROTATIHG  [ J OHM -part ] ) 

[83.83]  (ROTATIIG-CLOCKWISE  [BALL-part  JOHI-part]) 

[83.83]  (ROTATIMG  [BALL-part  JOHI-part]) 

[83.83]  (ROTATIIG-CLOCKWISE  [BALL  JOHI-part]) 

[83.83]  (ROTATIIG  [BALL  JOHI-part]) 

[83.97]  (TRAISLATIIG  [BALL-part]  PLACE-31) 

[83 . 97]  (MOVIIG-ROOT  [BALL-part] ) 

[83 . 97]  (MOVIIG  [BALL-part] ) 

[83 . 97]  (MOVIIG-ROOT  [JOHI-part] ) 

[83.97]  (TRAISLATIIG  [(EYE  JOHI)]  PLACE-30) 

[83.97]  (MOVIIG-ROOT  [(EYE  JOHI)]) 

[83.97]  (MOVIIG  [(EYE  JOHI)]) 

[83.97]  (TRAISLATIIG  [(LIIE-SEGMEIT3  BALL)]  PLACE-29) 

[83.97]  (MOVIIG-ROOT  [(LIIE-SEGMEIT3  BALL)]) 

[83.97]  (MOVIIG  [(LIIE-SEGMEMT3  BALL)]) 

[83.97]  (MOVIIG-ROOT  [BALL-part  JOHI-part]) 

[83,97] (MOVIIG-ROOT  [BALL  JOHI-part]) 

[98,99] (PLACE  [BALL-part]  PLACE-34) 

[98,99] (PLACE  [(EYE  JOHI)]  PLACE-33) 

[98,99] (PLACE  [(LIHE-SEGMEHT3  BALL)]  PLACE-32) 

[100 . 100]  (ROTATIIG-CLOCKWISE  [JOHI-part] ) 

[100 . 100]  (ROTATIIG  [JOHI-part] ) 

[100.100]  (ROTATIIG-CLOCKWISE  [BALL-part  JOHI-part]) 

[100.100]  (ROTATIHG  [BALL-part  JOHI-part]) 

[100.100]  (ROTATIIG-CLOCKWISE  [BALL  JOHI-part]) 

[100.100]  (ROTATIIG  [BALL  JOHI-part]) 

[100.113]  (TRAISLATIIG  [BALL-part]  PLACE-37) 

[100 . 113]  (MOVIIG-ROOT  [BALL-part] ) 

[100.113]  (MOVIIG  [BALL-part]) 

[100.113]  (MOVIIG-ROOT  [JOHI-part] ) 

[100. 113]  (TRAISLATIIG  [(EYE  JOHI)]  PLACE-36) 

[100. 113]  (MOVIIG-ROOT  [(EYE  JOHI)]) 

[100.113]  (MOVIIG  [(EYE  JOHI)]) 

[100.113]  (TRAISLATIIG  [(LIIE-SEGMEIT3  BALL)]  PLACE-35) 

[100.113]  (MOVIIG-ROOT  [(LIIE-SEGMERT3  BALL)]) 

[100.113]  (MOVIIG  [(LIHE-SEGMEIT3  BALL)]) 

[100. 113]  (MOVIIG-ROOT  [BALL-part  JOHI-part]) 

[100, 113] (MOVIIG-ROOT  [BALL  JOHI-part]) 

[114,115] (PLACE  [BALL-part]  PLACE-40) 

[114,115] (PLACE  [(EYE  JOHI)]  PLACE-39) 

[114,115] (PLACE  [(LIIE-SEGMEIT3  BALL)]  PLACE-38) 
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[116.1 16]  (ROTITIIG-CLOCKWISE  [JOHM-part] ) 

[116.116]  (ROTATIMG  [ JOHM-part] ) 

[116.116] (R0TATIIG-CL0CKWISE  [BALL-part  JOHM-part]) 

[116.116]  (ROTATIMG  [BALL-part  JOHM-part]) 

[116.116] (R0TATIMG-CL0CKWISE  [BALL  JOHM-part]) 

[116. 116]  (ROTATIMG  [BALL  JOHM-part]) 

[116.130] (TRAMSLATIMG  [BALL-part]  PLACE-43) 

[1 16 . 130]  (MOVIMG-ROOT  [BALL-part] ) 

[116.130] (M0VIMG  [BALL-part]) 

[116.130]  (MOVIMG-ROOT  [JOHM-part] ) 

[116.130]  (TRAMSLATIMG  [(EYE  JOHM)]  PLACE-42) 

[116. 130]  (MOVIMG-ROOT  [(EYE  JOHM)]) 

[116.130] (M0VIMG  [(EYE  JOHM)]) 

[116. 130]  (TRAMSLATIMG  [(LIME-SEGMEMT3  BALL)]  PUCE-41) 

[116. 130]  (MOVIMG-ROOT  [(LIME-SEGMEMT3  BALL)]) 

[116.130]  (MOVIRG  [(LIME-SEGMEMT3  BALL)]) 

[116. 130]  (MOVIMG-ROOT  [BALL-part  JOHM-part]) 

[116.130]  (MOVIMG-ROOT  [BALL  JOHM-part]) 

[131. 131]  (PLACE  [BALL-part]  PLACE-46) 

[131.131]  (PLACE  [(EYE  JOHM)]  PLACE-45) 

[131.131]  (PLACE  [(LIME-SEGMEMT3  BALL)]  PUCE-44) 
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[132 . 132]  (FLIPPIIG  [BALL-part] ) 

[132.132]  (TRIISLATIMG  [BALL-part]  PLACE-49) 

[132 . 132]  (ROTATIIG-COUITER-CLOCKWISE  [BALL-part] ) 

[132 . 132]  (ROTATIIG-CLOCKWISE  [BALL-part] ) 

[  1 32 , 1 32] ( ROTATING  [BALL-part] ) 

[132 . 132]  (MOVIIG-ROOT  [BALL-part] ) 

[132 . 132]  (MOVING  [BALL-part] ) 

[132 . 132]  (FLIPPING  [JOHN-part] ) 

[132.132]  (ROTATING-COUNTER-CLOCKWISE  [JOHN-part] ) 

[132 . 132]  (ROTATING-CLOCKWISE  [JOHN-part] ) 

[132.132]  (ROTATING  [JOHN-part]) 

[132.132]  (MOVING-ROOT  [JOHN-part] ) 

[132.132]  (TRANSLATING  [(EYE  JOHN)]  PLACE-48) 

[132.132]  (ROTATING-CLOCKWISE  [(EYE  JOHN)]) 

[132.132]  (ROTATING  [(EYE  JOHN)]) 

[132. 132]  (MOVING-ROOT  [(EYE  JOHN)]) 

[132.132]  (MOVING  [(EYE  JOHN)]) 

[132.132]  (TRANSLATING  [ (LINE-SEGMENTS  BALL)]  PLACE-47) 

[132. 132]  (ROTATING-COUNTER-CLOCKWISE  [(LINE-SEGMENTS  BALL)]) 

[132. 132]  (ROTATING  [(LINE-SEGMENTS  BALL)]) 

[132 . 132]  (MOVING-ROOT  [(LINE-SEGMENTS  BALL)] ) 

[132.132]  (MOVING  [(LINE-SEGMENTS  BALL)]) 

[132. 132]  (FLIPPING  [BALL-part  JOHN-part]) 

[132.132]  (ROTATING-COUNTER-CLOCKWISE  [BALL-part  JOHN-part]) 

[132.132]  (ROTATING-CLOCKWISE  [BALL-part  JOHN-part]) 

[132.132]  (ROTATING  [BALL-part  JOHN-part]) 

[132.132]  (MOVING-ROOT  [BALL-part  JOHN-part]) 

[132.132]  (FLIPPING  [BALL  JOHN-part]) 

[132. 132]  (ROTATING-COUNTER-CLOCKWISE  [BALL  JOHN-part]) 

[132 . 132]  (ROTATING-CLOCKWISE  [BALL  JOHN-part] ) 

[132.132]  (ROTATING  [BALL  JOHN-part]) 

[132.132]  (MOVING-ROOT  [BALL  JOHN-part]) 

[133.133]  (PLACE  [BALL-part]  PLACE-52) 

[133,133] (PLACE  [(EYE  JOHN)]  PLACE-61) 

[133.133]  (PLACE  [(LINE-SEGMENTS  BALL)]  PLACE-50) 

[134 . 134]  (ROTATING-COUNTER-CLOCKWISE  [JOHN-part] ) 

[134 . 134]  (ROTATING  [JOHN-part] ) 

[134.134]  (ROTATING-COUNTER-CLOCKWISE  [BALL-part  JOHN-part]) 

[134.134]  (ROTATING  [BALL-part  JOHN-part]) 

[134. 134]  (ROTATING-COUNTER-CLOCKWISE  [BALL  JOHN-part]) 

[134.134]  (ROTATING  [BALL  JOHN-part]) 
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Cl34,147](TRAMSLATIlfG  [BALL-part]  PLACE-55) 

[134.147]  (MOVIHG-ROOr  [BALL-part]) 

[134.147]  (MOVIMG  [BALL-part]) 

[134.147]  (MOVIHG-ROOT  [JOHN-part] ) 

[134.147]  (TRAHSLATIHG  [(EYE  JOHH)]  PLACE-54) 

[134. 147]  (MOVIHG-ROOT  [(EYE  JOHN)]) 

[134.147]  (MOVING  [(EYE  JOHN)]) 

[134.147]  (TRANSLATING  [(LINE-SEGMENT3  BALL)]  PLACE-53) 

[134.147]  (MOVING-ROOT  [(LINE-SEGMENT3  BALI,)]) 

[134.147]  (MOVING  [(LINE-SEGMEHT3  BALL)]) 

[134.147]  (MOVING-ROOT  [BALL-part  JOHN-part]) 

[134, 147] (MOVING-ROOT  [BALL  JOHN-part]) 

[148, 149] (PLACE  [BALL-part]  PLACE-58) 

[148,149] (PLACE  [(EYE  JOHN)]  PLACE-57) 

[148.149]  (PLACE  [(LINE-SEGMENTS  BALL)]  PLACE-56) 

[150 . 150]  (ROTATING-COUNTER-CLOCKHISE  [JOHN-part] ) 

[150 . 150]  (ROTATING  [JOHN-part]  ) 

[150.150]  (ROTATING-COUNTER-CLOCKWISE  [BALL-part  JOHN-part]) 

[150.150]  (ROTATING  [BALL-part  JOHN-part]) 

[150.150]  (ROTATING-COUNTER-CLOCKHISE  [BALL  JOHN-part]) 

[150.150]  (ROTATING  [BALL  JOHN-part]) 

[150. 163]  (TRANSLATING  [BALL-part]  PLACE-61) 

[150 . 163]  (MOVING-ROOT  [BALL-part] ) 

[150 . 163]  (MOVING  [BALL-part] ) 

[150 . 163]  (MOVING-ROOT  [JOHN-part] ) 

[150.163]  (TRANSLATING  [(EYE  JOHN)]  PLACE-60) 

[150.163]  (MOVING-ROOT  [(EYE  JOHN)]) 

[150.163]  (MOVING  [(EYE  JOHN)]) 

[150.163]  (TRANSLATING  [(LINE-SEGMEHT3  BALL)]  PLACE-59) 

[150.163] (M0VIHG-RCLT  [(LINE-SEGMENTS  BALL)]) 

[150.163]  (MOVING  [(LINE-SEGMENTS  BALL)]) 

[150.163]  MOVING-ROOT  [BALL-part  JOHN-part]) 

[150.163]  (MOVIHG-ROOT  [BALL  JOHN-part]) 

[164, 165] (PLACE  [BALL-part]  PLACE-64) 

[164,165] (PLACE  [(EYE  JOHN)]  PLACE-63) 

[164.165]  (PLACE  [(LIHE-SEGMENT3  BALL)]  PLACE-62) 

[166 . 166]  (ROTATING-COUNTER-CLOCKHISE  [JOHN-part] ) 

[166.166]  (ROTATING  [JOHN-part]) 

[166. 166]  (ROTATING-COUNTER-CLOCKHISE  [BALL-part  JOHN-part]) 

[166.166]  (ROTATING  [BALL-part  JOHN-part]) 

[166. 166]  (ROTATING-COUNTER-CLOCKHISE  [BALL  JOHN-part]) 

[166.166]  (ROTATING  [BALL  JOHN-part]) 
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